๐ฌ๐ผ๐๐ฟ ๐๐ ๐๐ด๐ฒ๐ป๐ ๐ฅ๐ฒ-๐ฅ๐ฒ๐ฎ๐ฑ๐ ๐๐๐ฒ๐ฟ๐ ๐ฃ๐ฎ๐ด๐ฒ ๐๐ ๐๐น๐ฟ๐ฒ๐ฎ๐ฑ๐ ๐ฆ๐ฎ๐
Your AI agent is likely wasting your money.
I measured the cost of a common mistake. In a 20-page session, a naive agent loop costs 8x more than a bounded window approach.
The problem is simple. A naive agent keeps every page in its message history. On turn 20, it re-sends pages 1 through 20. You pay for the first page 20 times. The cost grows quadratically.
If you run a ReAct loop or a LangChain agent that keeps the full transcript, you pay this tax.
The Data: Cumulative billed input tokens for 20 pages:
- Naive loop: 71,844 tokens
- Budget loop: 8,744 tokens
- Result: 88% savings with a budget layer.
How a budget layer works:
- Send the current page only.
- Keep one short rolling summary of previous pages.
- Cap the window size.
This keeps your costs linear. Each page costs about the same every turn.
What about prompt caching? Caching helps, but it does not fix the problem. Even with ideal caching, the naive loop still costs 1.8x more than the budget loop at 20 pages. If your agent steps are slow, you might lose the cache entirely and pay the full 8x tax.
The Trade-off: A budget layer is not free. A rolling summary costs tokens. You also lose some detail. If your agent needs a specific fact from page 3 while on page 18, a budget loop might miss it.
The choice is simple:
- Use a naive loop if you need perfect memory and have a huge budget.
- Use a budget loop if you want to scale without exploding costs.
Check your logs. If your billed input per turn climbs every single time, you are paying the tax.
Optional learning community: https://t.me/GyaanSetuAi