Sakana AI Launches Fugu to Orchestrate Multi-LLM Intelligence
Tokyo-based Sakana AI has unveiled Fugu, a sophisticated multi-LLM orchestrator designed to coordinate a pool of specialized models to solve complex tasks. By acting as a single intelligent layer, Fugu aims to rival the performance of industry leaders like Anthropic while offering a strategic hedge against vendor lock-in.
A Unified Interface for a Swappable Agent Pool
Fugu is not just another standalone large language model; it is a language model specifically trained to manage an "agent pool." To the end-user, the system functions as a single entity through an OpenAI-compatible API. Internally, however, Fugu performs a complex cycle of selection, delegation, execution, checking, and synthesis. Depending on the complexity of a prompt, Fugu may solve the problem solo or dynamically recruit a "team" of specialized models—including copies of itself—to tackle the workload.
Sakana AI is offering two distinct versions to meet different professional needs:
- Fugu Base: Optimized for low latency and everyday tasks such as chatbot interactions and standard code reviews.
- Fugu Ultra: Engineered for maximum reasoning quality, targeting high-stakes workflows like scientific paper reproduction, cybersecurity analysis, and patent searches.
Outperforming Frontier Models in Benchmarks
The performance metrics for Fugu Ultra are striking, placing it in direct competition with Anthropic’s highly anticipated Fable 5 and Mythos Preview. Notably, Fugu Ultra achieves these scores using a pool that does not include Anthropic’s models, suggesting even higher ceilings if those agents were integrated.
In rigorous testing, Fugu Ultra demonstrated superior capabilities across several key technical benchmarks:
- SWE Bench Pro: Fugu Ultra scored 73.7, significantly outperforming GPT 5.5 (58.6) and Gemini 3.1 Pro (54.2).
- LiveCodeBench: Fugu Ultra reached 93.2, surpassing Opus 4.8 (87.8) and GPT 5.5 (85.3).
- Humanity's Last Exam: The model achieved a 50.0, edging out Opus 4.8 (49.8).
- GPQA-D: Fugu Ultra matched the high standard of 95.5.
Early beta testers have reported massive efficiency gains in specialized fields. One developer noted that during code reviews, Fugu Ultra identified over 20 bugs, whereas GPT-5.5 flagged only approximately three.
Mitigating the Risks of AI Vendor Lock-in
Beyond pure performance, Sakana AI is positioning Fugu as a critical tool for digital sovereignty. In an era where export controls and regulatory shifts can suddenly restrict access to specific models (such as Anthropic's recent restrictions), relying on a single provider represents a material vulnerability for finance, governance, and critical infrastructure.
Because Fugu utilizes a swappable agent pool, organizations can reroute their workflows to different providers if one API goes dark. While not a total solution for "AI sovereignty"—as a widespread industry-wide restriction could still limit the pool—it provides a vital layer of resilience for enterprises looking to diversify their AI dependencies.
Key Takeaways
- Dynamic Orchestration: Fugu functions as a single API that internally manages a team of specialized models to solve multi-step, complex problems.
- Benchmark Dominance: Fugu Ultra competes directly with Anthropic’s Fable 5 and Mythos, showing significant leads in coding (SWE Bench Pro) and reasoning benchmarks.
- Strategic Resilience: The swappable model pool allows users to mitigate the risks of vendor lock-in and regulatory disruptions by diversifying AI providers.