𝗠𝗼𝗱𝗲𝗹 𝗦𝗵𝗼𝘄𝗱𝗼𝘄𝗻: 𝗟𝗼𝗰𝗮𝗹 𝘃𝘀. 𝗖𝗹𝗼𝘂𝗱 𝗖𝗼𝗱𝗶𝗻𝗴
Five local models. One cloud model. One real coding task.
The results are clear. Local models are not ready for agentic coding tasks on consumer hardware.
I tested five local models against Claude Sonnet 4. The goal was to build a tag manager for a blog admin panel. The models had to write code, pass builds, take screenshots, and push commits.
The Results:
• Sonnet 4 (Cloud): Complete. 4 commits. 10 minutes. Zero human help. • Qwen3-Coder 30B (Local): Partial. 1 commit. Worked but messy. • Qwen 3.6 35B (Local): Failed. Passed the build but never committed. • Gemma 4 12B (Local): Failed. Got stuck in a loop. • Hermes 4 14B (Local): Failed. Repeated the same error 13 times. • Devstral 24B (Local): Total failure. Could not use tools.
The Efficiency Gap
The difference is massive. Sonnet 4 finished the task using 19K tokens. The local models burned between 1 million and 4 million tokens. That is a 100x to 200x gap in efficiency.
Local models are not just slower. They struggle with reasoning. I saw four main issues:
- Degenerate loops: Models repeat the same wrong code or text dozens of times.
- Directory amnesia: Models forget where they are in the file system.
- Poor prioritization: Models focus on minor tasks instead of finishing the main goal.
- No self-diagnosis: Models try the same failing fix instead of reading documentation.
The Takeaway
Local models can write code that looks good. They fail at the last mile. Being an agent requires more than code generation. It requires managing state, fixing errors, and knowing when to ship.
Qwen3-Coder 30B is the only local model worth watching. It actually pushed working code to a branch. For a model running on a single consumer GPU, that is progress.
Optional learning community: https://t.me/GyaanSetuAi