𝗖𝗼𝗱𝗲𝘅 𝗙𝗶𝘅𝗶𝗻𝗴 𝗖𝗼𝗱𝗲𝘅: 𝗔 𝗖𝗼𝗻𝘀𝗲𝗻𝘀𝘂𝘀 𝗟𝗼𝗼𝗽
I built an agent loop that does more than suggest code. It writes code, reviews it, and merges its own pull requests.
To test it, I pointed the loop at a fork of the codex CLI. I let the agents try to fix the software themselves. This is a pure experiment. The fork has no users and no stars. This is about the mechanism, not a product.
Here is how the loop works:
- Intake: An upstream bug becomes an issue in the fork. The loop only picks small, mechanical bugs it can finish.
- Solvers Argue: Multiple agents propose different fixes. One solver wants the smallest change. Another wants clean structure. A third wants to delete code instead of adding it. They disagree.
- Judge Arbitrates: A judge reads the debate. If solvers disagree, the judge sends them back for more rounds. The judge also records why it rejected certain ideas.
- Implement and Merge: Once they reach consensus, the loop writes the patch, runs tests, and opens a PR. If tests pass, it merges itself.
You can see this in action in issue #34. The agents debated a concurrency bug. They went through three rounds of arbitration before reaching a decision. The loop produced a real fix and a regression test without a human typing a single line of code.
One interesting result happened in PR #16. The loop could not reproduce a reported bug. Instead of making up a fake fix, it simply added a test to lock the behavior and stopped. A loop that knows when not to patch is more useful than one that always produces a diff.
The loop has merged about 16 PRs so far. It handles small tasks like UTF-8 handling and command fixes. It does not maintain a whole codebase, but it closes small, bounded bugs from start to finish.
Humans still set the rules and review the work. We still check every PR. The code is automatic, but the attention is human.
You can see the entire process on GitHub. Look at issue #34 and PR #37 to see the debate.
Optional learning community: https://t.me/GyaanSetuAi