6 Bugs Only a Live Model Could Teach Us
Offline tests are necessary. They are not enough.
I built AgentOps Debugger to track environmental compliance in Peru. It uses Qwen-plus on Qwen Cloud to find records and write reports.
I designed the system to be offline-first. My 315 tests ran without any network calls. All tests passed. But when I switched to the live model on Alibaba Cloud, the system broke.
The code was fine. The model output was the problem.
Here are the six lessons from real-world model failures:
• Label Mismatch The schema expected "completed" or "failed." The model sent "success" or "done." The parser rejected useful answers because of a single word. Fix: Use tolerant preprocessors to normalize synonyms.
• Degenerate Plans The planner sometimes returned nothing. The app tried to turn this silence into a normal response. This created fake answers. Fix: Add a plan interpreter. If the plan is empty, tell the user the system failed to plan instead of lying.
• Schema Drift The model changed field names like "documentTitle" to "title." It also mixed English and Spanish labels. Fix: Use alias mapping and salvage valid parts. If one citation is bad, keep the other four.
• Unpaired Tasks The model asked to save a report before it even drafted one. The logic was safe, but the user experience was broken. Fix: The code must detect missing steps and insert them automatically.
• Loop Errors The model kept asking the same clarification questions even after the user answered. Fix: Move entity resolution from the model to the code. Once a user provides data, the system handles the rest deterministically.
• False Ambiguity The model claimed a company name was ambiguous when it was not. This stopped the workflow. Fix: Let the model suggest ambiguity, but let the data decide if it is real.
The main principle: Let the LLM narrate, but do not let it own structured outcomes.
The model should handle intent, planning, and language. The code must handle entity resolution, chart data, and report assembly.
A system becomes trustworthy when you trace every conclusion back to a record. Use the model for the story, but use your code for the truth.
Source: https://dev.to/ginollerena/six-bugs-only-a-live-model-could-teach-us-57k5
Optional learning community: https://t.me/GyaanSetuAi
