Thinking Machines drops a new, highly responsive model designed for humanlike interactions in real time
Summary
Thinking Machines Lab Inc., an AI research startup, has unveiled a research preview of its "interaction models," a new class of multimodal AI systems designed for real-time, humanlike interactions. These models overcome the typical pauses in AI communication by enabling "full-duplex" communication, allowing the AI to listen, see, and talk simultaneously. The core is a new architecture that processes inputs and outputs in 200-millisecond chunks, reacting instantly to visual or auditory cues. This "dual-model" architecture pairs TML-Interaction-Small, a 276-billion parameter mixture-of-experts model for rapid dialogue, with an asynchronous background agent for complex reasoning and web searches. This system achieved a turn-taking latency of less than 0.4 seconds on the FD-bench benchmark, outperforming Google's Gemini-3.1-flash-live (0.57 seconds) and GPT-realtime-2.0 (1.18 seconds).
Key takeaway
For CTOs and VPs of Engineering evaluating AI integration for high-stakes or customer-facing applications, Thinking Machines' interaction models offer a path to truly real-time, human-like AI. You should investigate this full-duplex architecture to reduce interaction latency and enable more natural, collaborative AI experiences, especially where immediate multimodal responsiveness is critical for operational efficiency or safety.
Key insights
New AI architecture enables full-duplex, real-time multimodal interaction by processing data in 200-millisecond micro-turns.
Principles
- AI must adapt to human interaction patterns.
- Low latency is critical for human-like collaboration.
- Dual-model architecture balances speed and deep reasoning.
Method
The system uses a multistream micro-turn-based design, processing inputs/outputs in 200ms chunks. It employs "encoder-free early fusion" for raw signal input directly into a lightweight embedding layer within the transformer.
In practice
- Monitor video feeds for real-time safety alerts.
- Enhance customer service calls with natural conversation flow.
- Manage time-sensitive requests without explicit timestamps.
Topics
- Interaction Models
- Full-Duplex AI
- Dual-Model Architecture
- TML-Interaction-Small
- Real-time AI
Best for: Machine Learning Engineer, CTO, VP of Engineering/Data, AI Engineer, AI Product Manager, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI – SiliconANGLE.