Frontier-Quality Coding at Cheap-Tier Cost

You can get frontier-quality coding scores at a fraction of the cost.

We built a system that uses a cheap local model for most tasks. It only sends hard problems to a frontier model. This method works because of the structure, not just the model size.

How the architecture works:

  • Two channels: A capability channel (the cheap local model) and a structure channel (verification gates).
  • Verification: Guards decide if an answer is trustworthy.
  • Escalation: If guards fail, the system moves the request to a frontier model.
  • Cache: A cache layer prevents re-solving exact repeats.

The results from our HumanEval+ tests:

  • Full cascade score: 94.5% plus correctness.
  • Local model solo score: 84.8% plus correctness.
  • The structure channel adds roughly 10 points of accuracy.

We tested the importance of the structure through an ablation study:

  • Full system: 100% correct.
  • Removed verification: 75% correct.
  • Removed guards: 50% correct.

Correctness drops by half when you remove the guards. This proves the structure carries the reliability.

The cost benefits:

  • Blended cost: $0.00201 per request.
  • Frontier cost: $0.017 per request.
  • Our system is about 8x cheaper than using a frontier model for every request.
  • 91% of requests are served by the local model.

A note on long context:

Our compaction layer uses 165 tokens compared to 28,000 tokens for raw context. This is a massive increase in efficiency. We hit an infrastructure limit at 208k tokens, but this is a setting, not a model failure.

What we have not proven yet:

We do not have official long-horizon benchmark numbers. We have built the runners for RULER and SWE-bench, but we have not run them in a clean sandbox. We are not claiming official results for long-horizon performance yet.

Summary of our claim:

Our system matches frontier coding scores while using cheap local models. This reduces costs by 8x. The reliability comes from our structure channel.

Source: https://dev.to/tom_jones_230c4659491adcd/frontier-quality-coding-at-cheap-tier-cost-what-we-built-and-how-we-measured-it-3g2j

Optional learning community: https://t.me/GyaanSetuAi