আমরা মডেলের ওপর বিশ্বাস করা বন্ধ করে দিয়েছি। তারপর আমরা আমাদের নিজেদের সংখ্যার ওপর বিশ্বাস করাও বন্ধ করে দিয়েছি।

Translated for your language. Read the original.

AI-assisted draft.

GyaanSetu Editorial২ সপ্তাহ আগে2min read

আমরা মডেলের ওপর বিশ্বাস করা বন্ধ করে দিয়েছি। তারপর আমরা আমাদের নিজেদের সংখ্যার ওপর বিশ্বাস করাও বন্ধ করে দিয়েছি।

We Stopped Trusting Models. Then We Stopped Trusting Our Own Numbers.

I stopped chasing better AI models. I thought a stronger model would fix my system. It did not. The problem was not the model. The problem was the system.

Then I realized something worse. I could not trust my own measurements either.

I saw three different failures:

A test suite that passed while measuring the wrong environment.
A gate that blocked work but gave wrong statistics.
An agent that reported the wrong count.

Each failure looked like success until I looked closer. My tools for verification were lying to me.

My first instinct was to ban all uncertainty. I wanted to remove every probabilistic element. I wanted everything to be deterministic.

That was a mistake.

If you remove all uncertainty, you remove the value of the AI. The AI is meant to propose ideas and explore fixes. You cannot get that from a rigid rule.

The solution is not to ban uncertainty. The solution is to place it correctly.

A system needs two different seats:

The Proposing Seat This seat explores and suggests. It needs nondeterminism. If a model suggests a wrong fix, the cost is low because it has not decided anything yet.
The Judging Seat This seat decides if a test passes or if a rule is met. This seat must be deterministic. It must be reproducible and checkable.

The failures in my system happened because I put the wrong things in the judging seat. I let uncertain processes make final decisions.

The rule is simple:

Let the uncertain parts explore.
Let deterministic parts judge.

Do not try to make the whole system certain. Instead, make sure your judges are solid. A deterministic judge that is wrong is more dangerous than a probabilistic one. A wrong judge creates a steady error you eventually stop questioning.

Every layer that gives you trust must be measured first. That measurement should rely on something deterministic you can witness.

How do you draw the line between proposing and judging in your AI systems? Where do you insist on determinism?

Source: https://dev.to/josephyeo/we-stopped-trusting-models-then-we-stopped-trusting-our-own-numbers-1611

Optional learning community: https://t.me/GyaanSetuAi

Continue reading

𝗔𝗜 𝗗𝗼𝗲𝘀 𝗡𝗼𝘁 𝗥𝗲𝗽𝗹𝗮𝗰𝗲 𝗝𝘂𝗱𝗴𝗺𝗲𝗻𝘁

𝗧𝗵𝗲 𝗧𝗲𝗹𝗹 𝗪𝗲 𝗧𝗿𝗮𝗶𝗻𝗲𝗱 𝗢𝘂𝘁

𝗦𝘁𝗼𝗽 𝗧𝗲𝗹𝗹𝗶𝗻𝗴 𝗬𝗼𝘂𝗿 𝗔𝗜 𝘁𝗼 𝗯𝗲 𝗰𝗮𝗿𝗲𝗳𝘂𝗹

আমি একটি এআই উপদেষ্টা বোর্ড তৈরি করেছি

আপনার ইভালসও (Evals) অনির্ভরযোগ্য: এমন একটি পাস রেট বিশ্বাস করা বন্ধ করুন যা আপনি পুনরায় তৈরি করতে পারেন না