Together AI at ICML 2026: frontier research across the full stack
Summary
Together AI announced eight research papers accepted at ICML 2026 in Seoul, spanning various layers of the AI stack. Key contributions include DSGym, a framework with 1,000+ tasks across 10+ domains for evaluating and training data science agents, and ThunderAgent, which achieves 1.5 to 3.6x higher agent throughput. TTT-Discover demonstrates leading discoveries in fields like mathematics and GPU kernels using open 120B models for approximately \$500 per problem. For model shaping, RARO enables RL-grade reasoning without verifiers, achieving a 25% win rate, while V1 improves answer correctness by up to 10% through unified generation and self-verification. Algorithmic optimizations feature Aurora, providing a 1.5x day-0 speedup and an additional 1.25x improvement for speculative decoding. Systems optimizations include Untied Ulysses, enabling 5M-token context training on a single 8xH100 node with 87.5% less attention memory, and OEA, reducing Mixture-of-Experts decode latency by up to 39% without retraining.
Key takeaway
For MLOps Engineers optimizing AI model performance and deployment, consider integrating full-stack research advancements. You can achieve significant gains by adopting solutions like ThunderAgent for up to 3.6x faster agent inference or OEA for up to 39% lower MoE decode latency. Explore frameworks like DSGym to standardize data science agent evaluation and training. These innovations allow you to push frontier capabilities while improving efficiency and resource utilization.
Key insights
Frontier AI progress requires full-stack research, from agents to GPU kernels, with gains at each layer feeding the next.
Principles
- AI development benefits from full-stack integration.
- Honest evaluation drives agent improvement.
- Online learning adapts models to live conditions.
Method
Advancing AI involves unifying evaluation APIs, applying reinforcement learning at test time, and optimizing inference engines for agent workflows and sparse models.
In practice
- Use DSGym for standardized data science agent evaluation.
- Implement ThunderAgent for 1.5-3.6x agent throughput.
- Deploy OEA for up to 39% faster MoE decode.
Topics
- AI Agents
- Model Evaluation
- Inference Optimization
- Speculative Decoding
- Mixture-of-Experts
- Context Parallelism
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Together AI | The AI Native Cloud - Together.ai.