6 błędów, których mógłby nas nauczyć tylko model działający na żywo

Translated for your language. Read the original.

AI-assisted draft.

GyaanSetu Editorial6 godzin temu2min read

6 błędów, których mógłby nas nauczyć tylko model działający na żywo

6 Bugs Only a Live Model Could Teach Us

Offline tests are necessary. They are not enough.

I built AgentOps Debugger to track environmental compliance in Peru. It uses Qwen-plus on Qwen Cloud to find records and write reports.

I designed the system to be offline-first. My 315 tests ran without any network calls. All tests passed. But when I switched to the live model on Alibaba Cloud, the system broke.

The code was fine. The model output was the problem.

Here are the six lessons from real-world model failures:

• Label Mismatch The schema expected "completed" or "failed." The model sent "success" or "done." The parser rejected useful answers because of a single word. Fix: Use tolerant preprocessors to normalize synonyms.

• Degenerate Plans The planner sometimes returned nothing. The app tried to turn this silence into a normal response. This created fake answers. Fix: Add a plan interpreter. If the plan is empty, tell the user the system failed to plan instead of lying.

• Schema Drift The model changed field names like "documentTitle" to "title." It also mixed English and Spanish labels. Fix: Use alias mapping and salvage valid parts. If one citation is bad, keep the other four.

• Unpaired Tasks The model asked to save a report before it even drafted one. The logic was safe, but the user experience was broken. Fix: The code must detect missing steps and insert them automatically.

• Loop Errors The model kept asking the same clarification questions even after the user answered. Fix: Move entity resolution from the model to the code. Once a user provides data, the system handles the rest deterministically.

• False Ambiguity The model claimed a company name was ambiguous when it was not. This stopped the workflow. Fix: Let the model suggest ambiguity, but let the data decide if it is real.

The main principle: Let the LLM narrate, but do not let it own structured outcomes.

The model should handle intent, planning, and language. The code must handle entity resolution, chart data, and report assembly.

A system becomes trustworthy when you trace every conclusion back to a record. Use the model for the story, but use your code for the truth.

Source: https://dev.to/ginollerena/six-bugs-only-a-live-model-could-teach-us-57k5

Optional learning community: https://t.me/GyaanSetuAi

6 błędów, których mógłby nas nauczyć tylko model działający na żywo

Continue reading

𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀 𝗛𝗮𝘃𝗲 𝗔 𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗣𝗿𝗼𝗯𝗹𝗲𝗺

Budowanie FailureDNA: Pamięć agenta, która wie, kiedy nie ufać samemu sobie

Zbudowałem skaner bezpieczeństwa AI — a potem znalazłem błąd we własnym detektorze

Zbudowałem skaner bezpieczeństwa AI — a potem znalazłem błąd we własnym detektorze