𝗠𝗖𝗣 𝗗𝗶𝗿𝘁𝘆 𝗦𝗲𝗰𝗿𝗲𝘁: 𝗬𝗼𝘂𝗿 𝗔𝗴𝗲𝗻𝘁 𝗜𝘀 𝗕𝘂𝗿𝗻𝗶𝗻𝗴 𝗧𝗼𝗸𝗲𝗻𝘀
Your AI agent pays a hidden tax every time it calls an MCP server. This tax is not in dollars. It is in tokens.
If you run agents at scale, this cost grows fast. I tracked my token usage and saw huge spikes. The problem is not the model reasoning. The problem is the context overhead.
When you connect an agent to an MCP server, the server sends tool definitions into the system prompt. These include every parameter and description.
If you use five MCP servers with 20 tools each, you add up to 15,000 tokens to every single turn. This happens before the model even speaks.
Here is the data from a 10-turn conversation test:
• No MCP: 2,400 tokens per turn • 3 MCP servers: 18,700 tokens per turn • 5 MCP servers: 31,200 tokens per turn
At current prices, a team running 50 conversations a day with 5 servers could spend $23,400 per month on MCP overhead alone.
This causes two main problems:
- Quality drops. When tool schemas take up 40% of your context window, the model has less room for history. The model starts to forget things because it runs out of space.
- Costs are fixed. You pay full price for these system prompts every single turn.
Here are three ways to fix this:
Use a Gateway Do not load all tool definitions at once. Use a gateway to inject only the tools needed for the current task. This can drop overhead from 8,000 tokens to 400 tokens per call.
Use an Intent Classifier Run a cheap model call first to decide which server is relevant. A tiny cost for a classifier can cut your MCP overhead by 60% to 80%.
Compress Your Schemas MCP schemas use a lot of words. Strip descriptions to essential nouns. Remove example fields. I found that a 400-token schema works perfectly at 120 tokens if you simplify the text.
Stop treating context as unlimited. Context budget is infrastructure. Manage it like a real cost.
How do you handle MCP overhead in your production agents? Let me know in the comments.
Optional learning community: https://t.me/GyaanSetuAi