Thinking Machines shows off preview of near-realtime AI voice and video conversation with new 'interaction models'
Summary
Thinking Machines, an AI startup founded by former OpenAI CTO Mira Murati and researcher John Schulman, has announced a research preview of "interaction models," a new class of native multimodal systems. These models prioritize interactivity within their architecture, moving beyond the traditional "turn-based" AI interaction. The core innovation is a "full-duplex" architecture that processes 200ms chunks of input and output simultaneously across text, imagery, audio, and video. This enables real-time responses, backchanneling, and proactive interjections. The preview introduces TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts (MoE) model, which features a dual system with an Interaction Model for immediate exchanges and a Background Model for asynchronous reasoning. Benchmarks like FD-bench show TML-Interaction-Small achieving a turn-taking latency of 0.40 seconds and an interaction quality score of 77.8, significantly outperforming competitors like Gemini-3.1-flash-live and GPT-realtime-2.0.
Key takeaway
For CTOs and VPs of Engineering evaluating next-generation AI integration, Thinking Machines' interaction models represent a significant leap beyond current turn-based systems. Their ability to handle simultaneous multimodal input and output, demonstrated by a 0.40-second turn-taking latency, could fundamentally transform enterprise applications requiring natural, real-time human-AI collaboration. You should monitor their upcoming limited research preview for potential pilot programs, especially for use cases in customer service, industrial monitoring, or any scenario where sub-second responsiveness and proactive AI engagement are critical.
Key insights
Thinking Machines' interaction models enable real-time, full-duplex human-AI communication by processing multimodal inputs and outputs simultaneously.
Principles
- Interactivity as a first-class citizen
- Multi-stream, micro-turn processing
- Dual model system for real-time and background tasks
Method
The system uses a multi-stream, micro-turn design, processing 200ms chunks of input/output simultaneously. It employs encoder-free early fusion, taking raw audio (dMel) and image patches (40x40) through a lightweight embedding layer, co-training all components within a transformer.
In practice
- Real-time safety monitoring in manufacturing
- Proactive customer service with natural latency
- Time-aware agents for industrial maintenance
Topics
- Thinking Machines
- Interaction Models
- Full-Duplex AI
- Multimodal AI
- Low-Latency AI
Best for: Machine Learning Engineer, CTO, VP of Engineering/Data, AI Scientist, AI Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.