What a Fast LLM Taught Me About Assumptions
I ran a cheap, fast LLM on a complex task for an hour. It did not fail.
Most people think weak models fail on long tasks. They drift or give up halfway. But this model stayed on track. This happened because I gave it a list of deliverables.
I thought these deliverables helped with correctness. I was wrong.
A study shows that deliverables do not make a model more correct. They make a model more verifiable. The model documents its work better. It leaves evidence for you to check.
There are two types of errors in software:
- Execution errors: A swapped comma or a missed edge case. You fix these with tests and linting.
- Assumption errors: Placing a boundary in the wrong spot. This is much harder to fix.
Process helps with execution errors. It does not solve assumption errors. If you and the model share the same blind spot, your review will fail too.
AI changes the math on these errors.
In the past, a human made mistakes slowly. This gave you time to notice. Now, an AI makes mistakes fast. A model can build three hours of perfect code on top of one wrong assumption before you notice.
The more capable a model looks, the more you trust it. You let it run longer. You stop checking as often. This is a trap. A wrong assumption does not flash a warning light. It looks like progress until it is too late.
The industry tries to fix this with more process. We add more specs and more plans. This is just more overhead. It is an execution tool for an assumption problem.
We need to stop measuring how often a model is right. We need to measure how long a wrong assumption survives before we catch it.
In production, we call this MTTD: Mean Time To Detect.
We cannot stop every error. We can only make the errors cheaper to fix. You do this by catching them early.
The goal is not just to find a smarter model. The goal is to decide where you still need to be the one in control.
Source: https://dev.to/g_correa/what-a-fast-llm-taught-me-about-assumptions-oe
Optional learning community: https://t.me/GyaanSetuAi
