I Wired an AI Fallback Runbook After a 19-Day Outage
Your primary model went dark for 19 days. What does your workflow do in the first hour?
Does it fail? Does it stall? Or does it route to a backup and keep moving?
If you cannot answer this, you do not have a resilience plan. You have luck. Luck fails when you need it most.
Fable 5 went offline for 19 days due to export controls. After the outage, I looked at my own setup. I found gaps. I wrote a runbook to fix them.
Do not wait for the next outage to write this. Use these three parts to build your hedge.
- Create a routing policy
Saying "we use Claude" is not a policy. It is a default.
A real policy is a map. It decides which task goes to which model. Bulk tasks go to cheap models. Complex agent tasks go to high-reasoning models.
Move this map from your head into a file. Use a YAML structure like this:
- bulk_classify: primary is Haiku, fallback is Gemini Flash
- long_agent_run: primary is Opus, fallback is Sonnet
- code_review: primary is Sonnet, fallback is GPT
When the primary model disappears, the router knows where to send the work. Nobody has to improvise at 2 AM.
- Bank your reserves
Some workflows cannot stop. For these, you must keep a reserve.
Plan-banking means you generate outputs while the model is up and cheap. You store these outputs to use during a blackout.
Think of it like canning food in the summer to eat during a winter storm.
Do not bank everything. That is a waste of storage. Only bank the two or three critical workflows that would hurt your business if they stopped.
- Run a weekly canary test
A fallback model you never test is just a guess. You must prove the backup works before the outage hits.
Set up a weekly test. Run your critical prompts against your fallback model. If the pass rate drops below your threshold, get an alert immediately.
You want to find errors on a quiet Thursday, not during a 19-day shutdown.
The risks are real. Government regulations can pull licenses at any time. You cannot predict which model gets restricted next. You can only hedge your bets by using more than one vendor.
These steps are cheap on a calm day. They are impossible to do in a panic.
Which of these three do you have written down today? Not "we could do it." I mean written down right now.
My routing map was ready. My second-source canary did not exist until last week.
Optional learning community: https://t.me/GyaanSetuAi
