My Agent Reported 12. The Real Number Was 13.

I am building a coding agent that runs locally. It uses Claude for planning and local models for code generation. Recently, I let the agent handle a simple task: counting specific logs.

The agent reported 12. I felt tired of manual bookkeeping, so I almost accepted it. Then I ran a manual check in my terminal. The real count was 13.

The agent missed one entry because it had an irregular shape. The agent was not hallucinating. It was just "almost right." This is the most dangerous kind of error. It looks plausible enough to trust.

Even worse, the final summary metric looked correct. The rounding and grouping steps hid the mistake. If I had only looked at the final report, I would have seen no error. But the raw data was wrong. Once your raw measurement is wrong, every future report inherits that error.

I learned a hard lesson about trust and measurement.

If you let the system that does the work also judge the work, you have a problem. You have made the examinee the examiner. A probabilistic model should never be your sole source of truth.

I am now following two new rules:

  • A human must witness the automation first. Before I trust a self-measuring system, I run a deterministic count myself. I watch the numbers come out in the terminal. I only relax this rule once the machine and the human match perfectly over many runs.

  • Pin measurements to observable units. I make sure the agent counts exactly what a human can see. If the population is loose, the numbers will drift. If the population is tight, we can actually compare results.

This approach is slower. It does not scale forever. But it is how you build a foundation of trust.

You can let AI write code. You can let AI run analysis. But for the numbers that matter, a deterministic process must be the final witness.

How do you draw the line? When do you decide a number is important enough to check by hand?

Source: https://dev.to/josephyeo/my-agent-reported-12-the-real-number-was-13-5864

Optional learning community: https://t.me/GyaanSetuAi