ATLAS: Agentic Test-time Learning-to-Allocate Scaling

2026-06-01 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

ATLAS is an agentic test-time scaling framework that empowers a Large Language Model (LLM) orchestrator to manage the entire control loop for improving LLM reasoning. Unlike fixed-workflow baselines, ATLAS uses a single "explore" action to dispatch independent solvers, gather evidence, decide when to stop, and synthesize the final answer. Evaluated with a Claude Sonnet 4.6 backbone, ATLAS achieved 56.00% on HLE-Verified, 82.29% on LiveCodeBench, 85.75% on GPQA-Diamond, and 23.71% on BabyVision, using significantly fewer API calls. An extension, ATLAS-MM, further boosted HLE-Verified to 60.00% and LiveCodeBench to 85.63% by adding solver choice. Ablations confirm the orchestrator's direct synthesis is crucial for these gains.

Key takeaway

For Machine Learning Engineers optimizing LLM inference costs and performance, adopting agentic test-time scaling like ATLAS offers a significant advantage. You can achieve higher accuracy on complex reasoning tasks, such as code generation and scientific QA, while substantially reducing API calls compared to traditional fixed-workflow methods. Evaluate integrating an LLM orchestrator to dynamically manage solver allocation and synthesis, especially for multi-model setups.

Key insights

ATLAS enables LLM orchestrators to dynamically manage test-time scaling for improved reasoning and efficiency.

Principles

LLM orchestration can surpass fixed scaling workflows.
Direct synthesis by the orchestrator is critical.

Method

An LLM orchestrator dispatches independent solvers via an "explore" action, then manages evidence, stopping, and final answer synthesis.

In practice

Apply to scientific QA, code generation, and multimodal reasoning tasks.
Consider multi-model extensions for further gains.

Topics

Large Language Models
Test-time Scaling
Agentic AI
LLM Orchestration
Code Generation
Multimodal Reasoning

Best for: Research Scientist, NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.