AGI: Are We There Yet?
We are not at AGI yet.
A year ago, I asked if we had reached Artificial General Intelligence. At that time, OpenAI's o3 model hit a major milestone on the ARC-AGI-1 benchmark. It showed a real jump in reasoning.
But I argued then that this was a pit stop, not the destination.
I was right.
The story today is not about AGI arriving. The story is more interesting. We have moved past simple chatbots. We are now in the era of frontier reasoning and agent systems.
Here is the current state of the field:
• Models are much better at reasoning and coding. • They use tools and process long contexts more effectively. • They can handle multimodal inputs like images and audio. • They are more economically useful than ever before.
But they still lack human-like generality.
The benchmarks tell the true story. While old tests like MMLU are saturated, new tests show the gaps.
• ARC-AGI-1 was a breakthrough for reasoning. • ARC-AGI-2 shows that novelty and composition are still very hard. • ARC-AGI-3 moves into interactive environments where models struggle to adapt.
We are also seeing a shift in how models scale. It is no longer just about more data. Scaling now happens through:
- Pretraining scale.
- Post-training and reinforcement learning.
- Inference-time reasoning and tool use.
A model that can pause, run code, and revise a plan is different from a model that just predicts the next word. This is the rise of agentic systems.
However, a major gap remains: reliability.
METR research shows that the time horizon for reliable task completion is growing. It doubles every few months. But a 50-minute task horizon is not a full workday. It is not a week of autonomous research.
We have moved from "models that answer" to "models that reason with tools."
We are building highly capable systems. But these systems are often broad yet brittle. They can solve graduate-level math but fail at simple, novel puzzles.
The honest position is this:
We are not at AGI. But we are much closer to something economically disruptive than most people expected.
We are building general-purpose reasoning systems. They look shockingly intelligent, yet they still fail in ways that prove they lack true human adaptability.
The milestone was real. The hype was too much. The real work is now about building robustness and autonomy.
Source: https://dev.to/ernestohs/agi-are-we-there-yet-a-follow-up-1471
Optional learning community: https://t.me/GyaanSetuAi
