Together AI at ICML 2026: frontier research across the full stack

· Source: Together AI | The AI Native Cloud - Together.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, long

Summary

Together AI announced eight research papers accepted at ICML 2026 in Seoul, spanning various layers of the AI stack. Key contributions include DSGym, a framework with 1,000+ tasks across 10+ domains for evaluating and training data science agents, and ThunderAgent, which achieves 1.5 to 3.6x higher agent throughput. TTT-Discover demonstrates leading discoveries in fields like mathematics and GPU kernels using open 120B models for approximately \$500 per problem. For model shaping, RARO enables RL-grade reasoning without verifiers, achieving a 25% win rate, while V1 improves answer correctness by up to 10% through unified generation and self-verification. Algorithmic optimizations feature Aurora, providing a 1.5x day-0 speedup and an additional 1.25x improvement for speculative decoding. Systems optimizations include Untied Ulysses, enabling 5M-token context training on a single 8xH100 node with 87.5% less attention memory, and OEA, reducing Mixture-of-Experts decode latency by up to 39% without retraining.

Key takeaway

For MLOps Engineers optimizing AI model performance and deployment, consider integrating full-stack research advancements. You can achieve significant gains by adopting solutions like ThunderAgent for up to 3.6x faster agent inference or OEA for up to 39% lower MoE decode latency. Explore frameworks like DSGym to standardize data science agent evaluation and training. These innovations allow you to push frontier capabilities while improving efficiency and resource utilization.

Key insights

Frontier AI progress requires full-stack research, from agents to GPU kernels, with gains at each layer feeding the next.

Principles

Method

Advancing AI involves unifying evaluation APIs, applying reinforcement learning at test time, and optimizing inference engines for agent workflows and sparse models.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Together AI | The AI Native Cloud - Together.ai.