๐๐ฒ๐๐๐ผ๐ป๐ ๐ณ๐ฟ๐ผ๐บ ๐ฎ ๐ญ๐ฌ๐ต-๐ฎ๐ด๐ฒ๐ป๐ ๐ฐ๐ผ๐ฑ๐ฒ ๐ฎ๐๐ฑ๐ถ๐ ๐๐ผ๐ฟ๐ธ๐ณ๐น๐ผ๐
I spent 9.3M tokens on a 109-agent code audit. Most of that money was wasted.
I used a swarm of AI agents to audit a 5,000-line codebase. The pipeline used mappers, finding lenses, deduplication, adversarial verification, a ranking panel, and synthesis.
The results were good. I found 32 verified issues. But the process was inefficient.
Here is why I wasted money:
- Verification cost too much. 86 agents worked on verification. They only found 2 errors. I paid to re-read code 86 times for a 6% success rate.
- Mapping was redundant. The finding agents read the code anyway. The mapping phase was a double tax.
- Finding lenses overlapped. 30% of findings were duplicates.
- JSON formatting was bloated. Using pretty-print in JSON increased prompt size by 40%.
- Cache reads dominated. 77% of tokens went to reading the same files over and over.
I verified everything before I ranked anything. This is a mistake. I paid premium prices to fact-check findings that I eventually deleted.
How to fix your agent workflows:
- Rank before you verify. Find issues, deduplicate them, rank them, and only verify the top 15. You get the same result with 70% fewer agents.
- Match paranoia to stakes. Use one agent for internal audits. Use a full panel only for high-stakes claims.
- Batch verification by file. If 34 findings live in 10 files, make one agent check all of them at once. Do not make 34 separate calls to the same file.
- Skip mappers for small repos. One agent can read the whole codebase.
- Limit your lenses. Use six lenses maximum. Give each lens clear boundaries so they do not overlap.
- Compact your JSON. Remove all whitespace. Pretty-printing is just expensive padding.
- Use cheap models for chores. Use frontier models for reasoning. Use cheap models for deduplication and evidence checking.
- Set a token budget. Have your orchestrator check the budget before every new step.
What worked well:
- Structured output schemas. I had zero parse failures across 109 agents.
- Output caps. Limiting findings per agent kept the funnel small.
- Resumable runs. I could stop and restart the process without losing progress.
The lesson is simple: Fan out to find, but converge before you verify. Breadth is for discovery. Rigor is for the survivors.
Source: https://dev.to/ayoubzulfiqar/lessons-from-a-109-agent-code-audit-workflow-4a5m
Optional learning community: https://t.me/GyaanSetuAi