Researchers automated LLM reasoning strategy design and cut token usage by 69.5%

2026-05-28 · Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, medium

Summary

Researchers from Meta, Google, and several universities have introduced AutoTTS, a novel framework that automates the discovery of optimal test-time scaling (TTS) strategies for large language models. Historically, TTS strategies, which enhance LLM performance by allocating extra compute during inference, were manually handcrafted, leading to suboptimal trade-offs between accuracy and cost. AutoTTS reframes this as an algorithmic search problem, allowing an explorer LLM to iteratively design and refine TTS "controllers" within an offline replay environment. This automated approach significantly reduces token consumption by up to 69.5% without sacrificing accuracy, as demonstrated on Qwen3 models (0.6B to 8B) and a distilled DeepSeek-R1 8B model across benchmarks like AIME24, AIME25, HMMT25, and GPQA-Diamond. The entire discovery process cost only \$39.90 and took 160 minutes.

Key takeaway

For MLOps Engineers deploying LLMs in production, AutoTTS offers a significant opportunity to optimize inference costs and performance. You can now automate the discovery of highly efficient test-time scaling strategies, potentially reducing token consumption by nearly 70% while boosting accuracy. Consider integrating the AutoTTS framework or its Confidence Momentum Controller to tailor reasoning strategies for your proprietary models, achieving cost-effective custom development for just tens of dollars.

Key insights

AutoTTS automates LLM test-time scaling strategy design, cutting token usage by 69.5% and improving accuracy.

Principles

Treat strategy design as an algorithmic search problem.
Use an explorer LLM to iteratively refine controllers.
Evaluate strategies against pre-collected reasoning trajectories.

Method

AutoTTS uses an explorer LLM to design and refine TTS controllers within an offline replay environment, evaluating proposed strategies against pre-collected reasoning trajectories to optimize for accuracy and cost.

In practice

Implement AutoTTS for dynamic compute optimization.
Use Confidence Momentum Controller as a drop-in replacement.
Tailor reasoning strategies to proprietary models.

Topics

Large Language Models
Test-Time Scaling
Automated Strategy Design
Inference Optimization
Token Consumption
MLOps

Code references

zhengkid/AutoTTS

Best for: AI Engineer, NLP Engineer, CTO, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.