We Stopped Trusting Models. Then We Stopped Trusting Our Own Numbers.
I stopped chasing better AI models. I thought a stronger model would fix my system. It did not. The problem was not the model. The problem was the system.
Then I realized something worse. I could not trust my own measurements either.
I saw three different failures:
- A test suite that passed while measuring the wrong environment.
- A gate that blocked work but gave wrong statistics.
- An agent that reported the wrong count.
Each failure looked like success until I looked closer. My tools for verification were lying to me.
My first instinct was to ban all uncertainty. I wanted to remove every probabilistic element. I wanted everything to be deterministic.
That was a mistake.
If you remove all uncertainty, you remove the value of the AI. The AI is meant to propose ideas and explore fixes. You cannot get that from a rigid rule.
The solution is not to ban uncertainty. The solution is to place it correctly.
A system needs two different seats:
The Proposing Seat This seat explores and suggests. It needs nondeterminism. If a model suggests a wrong fix, the cost is low because it has not decided anything yet.
The Judging Seat This seat decides if a test passes or if a rule is met. This seat must be deterministic. It must be reproducible and checkable.
The failures in my system happened because I put the wrong things in the judging seat. I let uncertain processes make final decisions.
The rule is simple:
- Let the uncertain parts explore.
- Let deterministic parts judge.
Do not try to make the whole system certain. Instead, make sure your judges are solid. A deterministic judge that is wrong is more dangerous than a probabilistic one. A wrong judge creates a steady error you eventually stop questioning.
Every layer that gives you trust must be measured first. That measurement should rely on something deterministic you can witness.
How do you draw the line between proposing and judging in your AI systems? Where do you insist on determinism?
Source: https://dev.to/josephyeo/we-stopped-trusting-models-then-we-stopped-trusting-our-own-numbers-1611
Optional learning community: https://t.me/GyaanSetuAi
