Sakana Fugu Ultra BEATS Fable 5 & GPT-5.5? (Fully Tested)

2026-06-23 · Source: WorldofAI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, long

Summary

Sakana AI, a new Japanese AI lab, introduced Fugu Ultra, a multi-agent orchestration system accessible via a single model API. Initial claims suggested Fugu Ultra matched or surpassed frontier models like Fable 5, Mythos, and GPT 5.5 on benchmarks such as Live Code Bench and Terminal Bench. However, extensive testing reveals Fugu Ultra coordinates multiple existing AI models, decomposing tasks, routing, critiquing, verifying, and synthesizing results. While this approach yields impressive benchmark scores on specific problem-solving tasks, it introduces latency, cost, and inconsistency on long-horizon or generic tasks, as evidenced by Sweep Bench Pro. Real-world evaluations, including a live trader desk, Crossy Road game, black hole simulation, flight simulator, and blindfold chess test, demonstrated Fugu Ultra's ability to produce polished results, though often with higher costs or specific issues, positioning it closer to models like GLM 5.2 or Chinchilla 5.2 in overall capability rather than Fable 5.

Key takeaway

For AI Scientists evaluating model capabilities or Machine Learning Engineers designing complex systems, understand that Fugu Ultra's impressive benchmark scores stem from its multi-agent orchestration, not a single frontier model. While it can deliver polished results for specific tasks like web development or simulations, its higher cost and latency make it less practical for generic, long-horizon applications compared to native frontier models. Prioritize evaluating total system performance, including cost and speed, when considering orchestration for your projects.

Key insights

Fugu Ultra is an AI orchestration system that leverages multiple models to achieve impressive, but often costly and slow, benchmark results.

Principles

Orchestration systems can outperform standalone models on specific tasks.
Task decomposition, routing, and verification enhance AI performance.
Benchmark scores may reflect system performance, not underlying model intelligence.

Method

A coordinator decomposes tasks into subtasks, routes them to suitable models, critiques and verifies outputs, then synthesizes results.

In practice

Consider orchestration for complex, multi-step problem-solving.
Evaluate total cost and latency for orchestrated AI systems.
Use GLM 5.2 for cost-efficient front-end web development.

Topics

Sakana AI
Fugu Ultra
AI Orchestration
Multi-Agent Systems
Benchmark Evaluation
Frontier Models

Best for: AI Architect, AI Engineer, AI Product Manager, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by WorldofAI.