๐ช๐ต๐ฒ๐ป ๐ ๐ ๐๐ ๐๐ฃ๐ ๐ช๐ฒ๐ป๐ ๐๐ผ๐๐ป
Last month, my side project broke.
The AI API I used for summaries returned a 503 error for three hours. My app stopped working. Users sent emails. It was embarrassing.
I made a classic mistake. I relied on one provider. I had a single point of failure.
I tried several ways to fix this.
First, I tried retries with backoff. This helps with small errors. It does nothing for long outages. Retrying a dead API just wastes time.
Second, I tried manual switching. I had to redeploy my code every time a provider went down. This does not scale.
I built a router to solve this. The router wraps multiple AI clients. It tries them in order. If one fails, it moves to the next.
Here is the logic:
- Define a list of clients.
- Loop through the list.
- Try the first client.
- If it fails, log the error and move to the second client.
- If all fail, raise an error.
I improved this system by adding three things:
- Validation: I check if the response is empty or junk.
- Delays: I add a small pause between attempts.
- Logging: I track which provider works best.
There are trade-offs to this approach:
- Latency: If the first provider fails, the total wait time increases.
- Cost: You might pay for failed requests.
- Consistency: Different models give different results. You must handle these variations.
My advice is to plan for failure. External services will fail. Use this fallback pattern for databases, CDNs, or any service.
I still use one primary API. But now, I sleep better. My app stays alive even if a provider dies.
How do you handle API outages in your projects?
Optional learning community: https://t.me/GyaanSetuAi