𝗟𝗲𝘀𝘀𝗼𝗻𝘀 𝗳𝗿𝗼𝗺 𝗮 𝟭𝟬𝟵 𝗮𝗴𝗲𝗻𝘁 𝗰𝗼𝗱𝗲 𝗮𝘂𝗱𝗶𝘁 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄

📅2 days ago⏱2 min read

𝗟𝗲𝘀𝘀𝗼𝗻𝘀 𝗳𝗿𝗼𝗺 𝗮 𝟭𝟬𝟵-𝗮𝗴𝗲𝗻𝘁 𝗰𝗼𝗱𝗲 𝗮𝘂𝗱𝗶𝘁 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄

I spent 9.3M tokens on a 109-agent code audit. Most of that money was wasted.

I used a swarm of AI agents to audit a 5,000-line codebase. The pipeline used mappers, finding lenses, deduplication, adversarial verification, a ranking panel, and synthesis.

The results were good. I found 32 verified issues. But the process was inefficient.

Here is why I wasted money:

Verification cost too much. 86 agents worked on verification. They only found 2 errors. I paid to re-read code 86 times for a 6% success rate.
Mapping was redundant. The finding agents read the code anyway. The mapping phase was a double tax.
Finding lenses overlapped. 30% of findings were duplicates.
JSON formatting was bloated. Using pretty-print in JSON increased prompt size by 40%.
Cache reads dominated. 77% of tokens went to reading the same files over and over.

I verified everything before I ranked anything. This is a mistake. I paid premium prices to fact-check findings that I eventually deleted.

How to fix your agent workflows:

Rank before you verify. Find issues, deduplicate them, rank them, and only verify the top 15. You get the same result with 70% fewer agents.
Match paranoia to stakes. Use one agent for internal audits. Use a full panel only for high-stakes claims.
Batch verification by file. If 34 findings live in 10 files, make one agent check all of them at once. Do not make 34 separate calls to the same file.
Skip mappers for small repos. One agent can read the whole codebase.
Limit your lenses. Use six lenses maximum. Give each lens clear boundaries so they do not overlap.
Compact your JSON. Remove all whitespace. Pretty-printing is just expensive padding.
Use cheap models for chores. Use frontier models for reasoning. Use cheap models for deduplication and evidence checking.
Set a token budget. Have your orchestrator check the budget before every new step.

What worked well:

Structured output schemas. I had zero parse failures across 109 agents.
Output caps. Limiting findings per agent kept the funnel small.
Resumable runs. I could stop and restart the process without losing progress.

The lesson is simple: Fan out to find, but converge before you verify. Breadth is for discovery. Rigor is for the survivors.

Source: https://dev.to/ayoubzulfiqar/lessons-from-a-109-agent-code-audit-workflow-4a5m

Optional learning community: https://t.me/GyaanSetuAi

𝗟𝗲𝘀𝘀𝗼𝗻𝘀 𝗳𝗿𝗼𝗺 𝗮 𝟭𝟬𝟵 𝗮𝗴𝗲𝗻𝘁 𝗰𝗼𝗱𝗲 𝗮𝘂𝗱𝗶𝘁 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄

Continue reading

𝗧𝗵𝗲 𝗧𝗿𝗮𝗽 𝗼𝗳 𝗔𝗜 𝗖𝗼𝗱𝗶𝗻𝗴

𝗧𝗵𝗲 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗕𝗶𝗹𝗹 𝗜𝘀 𝗛𝗲𝗿𝗲

𝗟𝗲𝘀𝘀𝗼𝗻𝘀 𝗳𝗿𝗼𝗺 𝗮 𝟭𝟬𝟵 𝗮𝗴𝗲𝗻𝘁 𝗰𝗼𝗱𝗲 𝗮𝘂𝗱𝗶𝘁 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄

𝗬𝗼𝘂𝗿 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗥𝗲 𝗥𝗲𝗮𝗱𝘀 𝗘𝘃𝗲𝗿𝘆 𝗣𝗮𝗴𝗲 𝗜𝘁 𝗔𝗹𝗿𝗲𝗮𝗱𝘆 𝗦𝗮𝘄

𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝗶𝗻𝗴 𝗳𝗼𝗿 𝘁𝗵𝗲 𝟮𝟬𝟮𝟲 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗦𝘁𝗮𝗰𝗸