[NOTICE] This feed is no longer maintained
Summary
Thinking Machines Lab has introduced "Interaction Models," a new class of AI that handles human-AI interaction natively rather than through external scaffolding. Announced on May 11, 2026, these models are designed to continuously process audio, video, and text, enabling real-time collaboration. The `TML-Interaction-Small` model, a 276B parameter MoE with 12B active, demonstrates qualitatively new interaction capabilities and achieves state-of-the-art combined performance in intelligence and responsiveness. It outperforms existing models on interaction quality benchmarks like FD-bench v1.5 (77.8% vs. 46.8% for GPT-realtime-2.0 minimal) and shows strong intelligence on Audio MultiChallenge (43.4%). The approach utilizes a multi-stream, micro-turn design, processing 200ms chunks of input and output, and integrates an asynchronous background model for deeper reasoning, tool use, and longer-horizon tasks.
Key takeaway
For research scientists developing human-AI collaboration systems, you should explore integrating native interaction capabilities into your AI models. This approach, exemplified by Thinking Machines Lab's Interaction Models, offers superior real-time responsiveness and multimodal collaboration compared to traditional turn-based or harness-dependent systems. Consider adopting a micro-turn architecture and a split interaction/background model design to achieve both high intelligence and seamless, continuous user experience, addressing the "collaboration bottleneck" in current AI interfaces.
Key insights
Native interaction models enable real-time, multimodal human-AI collaboration by integrating interactivity directly into the AI's core architecture.
Principles
- Interactivity must scale with intelligence.
- Continuous processing improves collaboration.
- Separate interaction and background models.
Method
Interaction models use a multi-stream, micro-turn design, processing 200ms input/output chunks, and employ encoder-free early fusion for multimodal data, co-trained with the transformer.
In practice
- Implement time-aligned micro-turns for real-time responsiveness.
- Delegate complex tasks to an asynchronous background model.
- Utilize streaming sessions for efficient inference of small chunks.
Topics
- Interaction Models
- Multimodal AI Collaboration
- On-Policy Distillation
- Low-Rank Adaptation
- Manifold Optimization
Code references
- sgl-project/sglang
- thinking-machines-lab/tinker-cookbook
- deepseek-ai/DeepSeek-V3.2-Exp
- huggingface/peft
- NVlabs/edm2
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Thinking Machines Lab - Connectionism.