𝗜 𝗖𝘂𝘁 𝗠𝘆 𝗔𝗜 𝗔𝗣𝗜 𝗖𝗼𝘀𝘁𝘀 𝗕𝘆 𝟳𝟬%

📅2 weeks ago⏱1 min read

My OpenAI bill jumped from $30 to $150. A small Slack bot caused this. Repeated prompts and retries cost too much.

I tried simple fixes. I used basic caching. I switched models. Nothing worked. Users rephrase questions. Basic caching fails when words change.

I built an AI proxy. It sits between my app and the API. It does three things:

Semantic caching. I use embeddings to find similar questions. I serve the cached answer if the match is high.
Rate limiting. I use Redis to stop request bursts.
Retry buffers. The proxy retries failed calls automatically.

This cut my costs by 70%.

There are trade-offs:

Lessons for you:

Stop treating AI APIs as black boxes. They are HTTP endpoints. Use middleware to control them.

What is your setup? Do you use a service or build your own?

Continue reading