𝗧𝗵𝗲 𝗡𝘂𝗹𝗹 𝗜𝗻𝗽𝘂𝘁 𝗧𝗵𝗮𝘁 𝗕𝗿𝗼𝗸𝗲 𝗠𝘆 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗔𝗴𝗲𝗻𝘁
The demo ran perfectly for three weeks. Every test input worked. Every output went to the right place. I thought the system was reliable.
Then a supplier sent an email with an empty subject line.
The agent expected a string to extract an order reference. Instead, it received a null value. It did not crash. That would have been better. It generated a fake order reference that looked real. The downstream system processed it. Nobody noticed for four hours.
Demos use inputs you expect. Production uses inputs you do not expect.
I run the agent operation at aienterprise.dk. I saw the full trace. The prompt told the agent to extract the order reference from the subject line. This works if the subject line exists.
If the subject line is missing, a large language model fills the gap. It invents something that looks correct. This is not random noise. It is structured noise. It is dangerous because it looks right. You can catch a failure. You cannot easily catch a confident, wrong answer.
I did not retrain the model. I did not change the prompt. I added a guard before the model call.
Now, a simple check runs first. It asks: is the subject field present and non-empty? If the answer is no, the message goes to a hold queue for a human. The agent never sees the bad input.
This guard is twelve lines of code. It is the most important thing I built this year.
The pattern is simple. If an agent assumes structure, production will eventually send unstructured data. The fix is not a smarter model. The fix is a boundary. You need a check that routes bad input to a human instead of letting the model guess.
Reliability is the only feature. A demo shows an agent can do a task. Production shows an agent does the task at 3 AM on bad input. Only the second part matters to your customers.
My agent now processes 200 operations per day without issues. The hold queue triggers twice a week. A human reviews the odd data. I learn what production looks like.
If you build agents for high-risk categories under the EU AI Act, the deadline is December 2, 2027. This includes employment, biometrics, and education. A system that guesses on bad input will fail an audit. This guard is a compliance minimum.
Reliability is not a feature you add later.
Input ya null iliyovuruga agent wangu wa production na nini kilirekebisha
Wazia unayo agent wa AI uliotayarishwa kwa ajili ya production, unafanya kazi vizuri, na unashughulikia maelfu ya maombi. Kisha, ghafla, mfumo unazima. Unapoangalia logs, unakutana na kosa ambalo linaonekana kuwa la kijinga: TypeError: cannot read property 'data' of null.
Hili ndilo lililotokea kwangu. Na mhusika alikuwa null.
Tukio Hilo
Nilikuwa nimejenga agent wa AI kwa kutumia LLM (Large Language Model) ili kuchakata data na kurudisha miundo ya JSON iliyopangwa. Kila kitu kilionekana kuwa sawa wakati wa majaribio (testing). Prompt zangu zilikuwa imara, na schema nilizozitengeneza zilikuwa sahihi.
Lakini, wakati agent alipokuwa kwenye production, alianza kuanza kufeli mara kwa mara. Kila mara ilipofeli, ilikuwa inarudisha kosa lile lile la null.
Chanzo cha Tatizo
Tatizo halikuwa kwenye kodi yangu ya msingi, bali lilikuwa kwenye tabia ya LLM. Ingawa nilikuwa nimeiambia LLM irudishe JSON pekee, kuna wakati—hasa wakati wa shinikizo au maelekezo magumu—LLM ilirudisha null badala ya object ya JSON.
Kodi yangu ilikuwa na mtazamo wa "kuchukulia kila kitu kuwa sawa" (optimistic approach). Nilichukulia kuwa kila jibu kutoka kwa LLM lingekuwa na muundo uleule uliotarajiwa.
// Kodi iliyokuwa na tatizo
const response = await callLLM(prompt);
const data = response.data; // Hapa ndipo palipovurugika ikiwa response ni null
console.log(data.user_id);
Ikiwa response ilikuwa null, jaribio la kusoma response.data lilikuwa linasababisha agent wangu kuzima kabisa.
Suluhisho
Ili kurekebisha hili, nilihitaji kuacha kutegemea LLM kuwa kamili na badala yake kuweka mfumo wa ulinzi. Nilifanya mambo mawili makuu:
1. Validation ya Schema (Pydantic/Zod)
Badala ya kuchukulia jibu kama ukweli mtupu, nilianza kulipitia kupitia mfumo wa validation. Ikiwa unatumia Python, Pydantic ni chaguo bora. Ikiwa unatumia TypeScript, Zod ni nzuri sana.
Hii inahakikisha kuwa hata kama LLM itarudisha kitu kisichoeleweka, kodi yako haitavurugika.
// Suluhisho kwa kutumia Zod (TypeScript)
import { z } from 'zod';
const ResponseSchema = z.object({
user_id: z.string(),
action: z.string(),
});
async function getAgentResponse(prompt: string) {
const response = await callLLM(prompt);
// Angalia ikiwa response ni null au haijatolewa
if (!response) {
throw new Error("LLM imerudisha jibu tupu (null)");
}
// Validate muundo wa data
const result = ResponseSchema.safeParse(response);
if (!result.success) {
console.error("Data haijafikia muundo unaotarajiwa:", result.error);
return null;
}
return result.data;
}
2. Defensive Programming (Programu ya Kinga)
Nilianza kutumia mbinu za "defensive programming". Hii inamaanisha kuandika kodi inayojua kuwa mambo yanaweza kwenda vibaya. Badala ya kuamini kuwa response itakuwepo, niliweka ukaguzi wa awali (null checks).
Funzo Nililopata
Kujenga mifumo ya AI ni tofauti na kujenga mifumo ya kawaida ya programu. Katika programu za kawaida, mantiki (logic) yako inategemea kanuni za kimahesabu ambazo ni thabiti. Katika mifumo ya AI, mantiki inategemea "probabilistic outputs"—yaani, LLM inatoa majibu kulingana na uwezekano, na uwezekano huo unaweza kubadilika.
Somo kuu: Usiamini kamwe jibu kutoka kwa LLM. Daima validate, daima angalia null, na daima tayari kwa ajili ya matokeo yasiyotarajiwa.
Optional learning community: https://t.me/GyaanSetuAi