𝗪𝗵𝗲𝗻 𝗠𝘆 𝗔𝗜 𝗔𝗣𝗜 𝗪𝗲𝗻𝘁 𝗗𝗼𝘄𝗻

📅2 days ago⏱2 min read

Last month, my side project broke.

The AI API I used for summaries returned a 503 error for three hours. My app stopped working. Users sent emails. It was embarrassing.

I made a classic mistake. I relied on one provider. I had a single point of failure.

I tried several ways to fix this.

First, I tried retries with backoff. This helps with small errors. It does nothing for long outages. Retrying a dead API just wastes time.

Second, I tried manual switching. I had to redeploy my code every time a provider went down. This does not scale.

I built a router to solve this. The router wraps multiple AI clients. It tries them in order. If one fails, it moves to the next.

Here is the logic:

I improved this system by adding three things:

There are trade-offs to this approach:

Latency: If the first provider fails, the total wait time increases.
Cost: You might pay for failed requests.
Consistency: Different models give different results. You must handle these variations.

My advice is to plan for failure. External services will fail. Use this fallback pattern for databases, CDNs, or any service.

I still use one primary API. But now, I sleep better. My app stays alive even if a provider dies.

How do you handle API outages in your projects?

Optional learning community: https://t.me/GyaanSetuAi

Continue reading