ATLAS: Agentic Test-time Learning-to-Allocate Scaling

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

ATLAS is an agentic test-time scaling framework that empowers a Large Language Model (LLM) orchestrator to manage the entire control loop for improving LLM reasoning. Unlike fixed-workflow baselines, ATLAS uses a single "explore" action to dispatch independent solvers, gather evidence, decide when to stop, and synthesize the final answer. Evaluated with a Claude Sonnet 4.6 backbone, ATLAS achieved 56.00% on HLE-Verified, 82.29% on LiveCodeBench, 85.75% on GPQA-Diamond, and 23.71% on BabyVision, using significantly fewer API calls. An extension, ATLAS-MM, further boosted HLE-Verified to 60.00% and LiveCodeBench to 85.63% by adding solver choice. Ablations confirm the orchestrator's direct synthesis is crucial for these gains.

Key takeaway

For Machine Learning Engineers optimizing LLM inference costs and performance, adopting agentic test-time scaling like ATLAS offers a significant advantage. You can achieve higher accuracy on complex reasoning tasks, such as code generation and scientific QA, while substantially reducing API calls compared to traditional fixed-workflow methods. Evaluate integrating an LLM orchestrator to dynamically manage solver allocation and synthesis, especially for multi-model setups.

Key insights

ATLAS enables LLM orchestrators to dynamically manage test-time scaling for improved reasoning and efficiency.

Principles

Method

An LLM orchestrator dispatches independent solvers via an "explore" action, then manages evidence, stopping, and final answer synthesis.

In practice

Topics

Best for: Research Scientist, NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.