TTT-Discover optimizes GPU kernels 2x faster than human experts — by training during inference

2026-02-05 · Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Advanced, medium

Summary

Researchers from Stanford, Nvidia, and Together AI have introduced TTT-Discover, a novel technique that optimizes GPU kernels up to 2x faster than human experts by enabling models to train during inference. Unlike "frozen" models that rely on static parameters, TTT-Discover treats test problems as environments to be mastered, updating model weights in real-time using generated data like failures and partial successes. This approach allows the model to focus intensely on specific challenges, moving beyond general problem-solving. The method employs an "entropic objective" to prioritize high-reward outcomes and a PUCT tree-search algorithm, inspired by AlphaZero, to explore solution paths and train on generated datasets. While a single discovery run can cost around $500, it is economically viable for high-value, low-frequency problems like optimizing critical data pipelines or drug design, where even small improvements yield significant ROI. TTT-Discover works with open-weights models like gpt-oss-120b and can be run in private VPCs, requiring existing reinforcement learning infrastructure or solutions like the Tinker API.

Key takeaway

For CTOs and AI Scientists evaluating advanced optimization techniques, TTT-Discover offers a compelling approach for "million-dollar problems" with verifiable scalar signals. You should consider deploying this method for high-impact, low-frequency challenges like critical infrastructure optimization or complex scientific discovery, where the $500 per-problem cost is justified by substantial ROI. Ensure your infrastructure supports reinforcement learning or leverage tools like Tinker API for implementation.

Key insights

TTT-Discover enables AI models to train during inference, optimizing complex problems like GPU kernels 2x faster than human experts.

Principles

Treat test problems as environments to master.
Prioritize high-reward outcomes with an entropic objective.
Utilize continuous reward signals for effective optimization.

Method

TTT-Discover updates model weights during inference using an entropic objective and PUCT tree search, learning from failures and successes to solve specific, high-value problems.

In practice

Optimize GPU kernels for matrix multiplication.
Improve supply chain routing and logistics.
Accelerate drug and material discovery.

Topics

TTT-Discover
GPU Kernel Optimization
Reinforcement Learning
Test-Time Training
Algorithmic Discovery

Code references

test-time-training/discover

Best for: AI Scientist, Research Scientist, CTO, AI Researcher, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.