𝗜 𝗖𝘂𝘁 𝗠𝘆 𝗔𝗜 𝗔𝗣𝗜 𝗖𝗼𝘀𝘁𝘀 𝗕𝘆 𝟳𝟬%

My OpenAI bill jumped from $30 to $150. A small Slack bot caused this. Repeated prompts and retries cost too much.

I tried simple fixes. I used basic caching. I switched models. Nothing worked. Users rephrase questions. Basic caching fails when words change.

I built an AI proxy. It sits between my app and the API. It does three things:

  • Semantic caching. I use embeddings to find similar questions. I serve the cached answer if the match is high.
  • Rate limiting. I use Redis to stop request bursts.
  • Retry buffers. The proxy retries failed calls automatically.

This cut my costs by 70%.

There are trade-offs:

  • Latency. It adds 200ms per request.
  • Memory. Redis needs space for vectors.
  • Accuracy. Some similar prompts need different answers.

Lessons for you:

  • Start with open source tools like LiteLLM.
  • Track your data from day one.
  • Use message queues for high traffic.

Stop treating AI APIs as black boxes. They are HTTP endpoints. Use middleware to control them.

What is your setup? Do you use a service or build your own?

Source: https://dev.to/__c1b9e06dc90a7e0a676b/i-built-a-simple-ai-proxy-to-cut-api-costs-heres-what-i-learned-3hcf