ATLAS: Agentic Test-time Learning-to-Allocate Scaling
Summary
ATLAS is an agentic test-time scaling framework that empowers a Large Language Model (LLM) orchestrator to manage the entire control loop for improving LLM reasoning. Unlike fixed-workflow baselines, ATLAS uses a single "explore" action to dispatch independent solvers, gather evidence, decide when to stop, and synthesize the final answer. Evaluated with a Claude Sonnet 4.6 backbone, ATLAS achieved 56.00% on HLE-Verified, 82.29% on LiveCodeBench, 85.75% on GPQA-Diamond, and 23.71% on BabyVision, using significantly fewer API calls. An extension, ATLAS-MM, further boosted HLE-Verified to 60.00% and LiveCodeBench to 85.63% by adding solver choice. Ablations confirm the orchestrator's direct synthesis is crucial for these gains.
Key takeaway
For Machine Learning Engineers optimizing LLM inference costs and performance, adopting agentic test-time scaling like ATLAS offers a significant advantage. You can achieve higher accuracy on complex reasoning tasks, such as code generation and scientific QA, while substantially reducing API calls compared to traditional fixed-workflow methods. Evaluate integrating an LLM orchestrator to dynamically manage solver allocation and synthesis, especially for multi-model setups.
Key insights
ATLAS enables LLM orchestrators to dynamically manage test-time scaling for improved reasoning and efficiency.
Principles
- LLM orchestration can surpass fixed scaling workflows.
- Direct synthesis by the orchestrator is critical.
Method
An LLM orchestrator dispatches independent solvers via an "explore" action, then manages evidence, stopping, and final answer synthesis.
In practice
- Apply to scientific QA, code generation, and multimodal reasoning tasks.
- Consider multi-model extensions for further gains.
Topics
- Large Language Models
- Test-time Scaling
- Agentic AI
- LLM Orchestration
- Code Generation
- Multimodal Reasoning
Best for: Research Scientist, NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.