𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵 𝗖𝗹𝗼𝘂𝗱𝗪𝗮𝘁𝗰𝗵
Logging every agent call to a database is not monitoring. It is just storage.
If you need to run SQL queries at 2:00 AM to see if your summarizer is slow, you have failed at observability. You need dashboards and alarms, not database rows.
I found two ways to monitor AI agents without adding latency or complex code.
𝟭. Use Metric Filters for Failure Modes
Failure modes like budget caps or service throttling should not be invisible. Do not write new code to call an API. Instead, use your existing logs.
When a budget cap is hit, your code logs an error. You can set up a CloudWatch Metric Filter to scan those logs. If the pattern matches, CloudWatch increments a metric.
This method is cheap. It requires no extra IAM permissions and adds zero latency to your agent.
Use this for:
- Monthly cost cap reached
- Bedrock throttling errors
- General agent failures
𝟮. Use EMF for Performance Data
If you want to track latency, token usage, or cost per agent, Metric Filters are not enough. You need dimensions.
Do not use PutMetricData. It is a synchronous network call. It adds 30ms to 80ms to your request. It can also fail if CloudWatch itself is under load.
Instead, use Embedded Metric Format (EMF).
You write a single line of JSON to stdout. CloudWatch automatically extracts these as metrics with dimensions.
With one JSON line, you get:
- Total invocations
- Error rates
- Latency (P95)
- Input and output tokens
- Cost per model and per agent
𝗧𝗵𝗲 𝗥𝘂𝗹𝗲𝘀 𝗼𝗳 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆
- Emit a line and let CloudWatch do the work.
- Never let telemetry break your agent. Wrap your metric calls in try-except blocks.
- Alarm on bursts, not single events. One throttle is normal. Ten throttles in five minutes is an incident.
- Use dimensions for specific agents, but use aggregates for system-wide latency.
- Match errors by code, not by text strings.
You can build a professional monitoring stack for $0 using only logs and EMF.
Source: https://dev.to/aws-builders/monitorear-agentes-de-ia-con-cloudwatch-45c4
Optional learning community: https://t.me/GyaanSetuAi