𝗬𝗼𝘂𝗿 𝗔𝗴𝗲𝗻𝘁 𝗖𝗵𝗲𝗰𝗸𝗲𝗱 𝗘𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴. 𝗜𝘁 𝗪𝗮𝘀 𝗦𝘁𝗶𝗹𝗹 𝗪𝗿𝗼𝗻𝗴.

I run a multi-agent workflow. One model designs. One writes code. One reviews it. I approve the final result.

Recently, three failures passed through this pipeline. Each agent did its job perfectly. The system was consistent, yet it was consistently wrong.

This is not an intelligence problem. It is a boundary problem. An agent does exactly what you ask within the context you provide. It will not discover new things to verify on its own.

Here are three real-world failures and how to fix them:

  1. Success hiding failure An ETL pipeline pulled data from an API. The API session expired. Instead of an error code, the API returned an HTTP 200 with an error message inside the JSON. The agent checked for an error code, found none, and assumed the data was valid.
  • The Fix: Use semantic validation. Do not just check if a call succeeded. Check if the returned data matches the expected structure and row count.
  1. Missing artifacts A code generator produced C files for a chip. The reviewer confirmed the code was correct. However, the generator never created the required widget table file. The reviewer checked the files that existed but did not check for files that were missing.
  • The Fix: Verify output completeness. Always list the required files first. Confirm every file exists and is not empty before moving to the next step.
  1. False technical claims An SDK folder claimed to be for a RISC-V chip, but the header comments said it was for a CSKY processor. The agent trusted the folder name and the comments. It ignored the actual machine instructions that proved the claim was wrong.
  • The Fix: Use ground-truth verification. If a file makes a claim, test that claim with a command. Do not trust comments or directory names. Trust the raw data.

Agents will verify what you tell them to verify. They will not ask, "What else could be wrong?"

You must design the boundaries. You must build verification checkpoints at the edges of your workflow.

Source: https://dev.to/antonio_zhu_e726fd856cd86/your-agent-checked-everything-it-was-still-wrong-18kd

Optional learning community: https://t.me/GyaanSetuAi