Mira Murati’s Thinking Machines Lab Introduces Interaction Models: A Native Multimodal Architecture for Real-Time Human-AI Collaboration

2026-05-13 · Source: Machine Learning ML & Generative AI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Mira Murati's Thinking Machines Lab has unveiled a research preview of TML-Interaction-Small, a 276B MoE model with 12B active parameters designed for real-time human-AI collaboration. This model employs a native multimodal architecture, processing 200ms chunks of audio, video, and text simultaneously through a multi-stream, time-aligned micro-turn system. Unlike conventional real-time AI that often relies on bolted-on voice-activity detection, TML-Interaction-Small integrates interactivity directly into its weights, eliminating external turn-detection scaffolding. It supports full-duplex interaction and asynchronous background reasoning, sharing full conversation context. The model achieved a 77.8 score on FD-bench v1.5 compared to 47.8 for GPT-realtime-2.0, and a 32.4 Charades mIoU for visual proactivity, significantly outperforming GPT-realtime-2.0's 0.

Key takeaway

For AI Engineers developing real-time conversational AI, you should investigate architectures that co-train multimodal inputs and embed interaction directly into the model's weights. This approach, exemplified by TML-Interaction-Small, offers superior performance in full-duplex interaction and visual proactivity compared to systems relying on external turn-detection, potentially enabling more natural and efficient human-AI collaboration.

Key insights

Native multimodal architectures can enable true real-time human-AI collaboration by integrating interactivity into model weights.

Principles

Co-train modalities from scratch.
Integrate interactivity into model weights.

Method

The TML-Interaction-Small model uses a multi-stream, time-aligned micro-turn architecture to process 200ms chunks of audio (dMel), video (40x40 hMLP patches), and text simultaneously.

In practice

Implement full-duplex interaction.
Run asynchronous background reasoning.

Topics

Interaction Models
TML-Interaction-Small
Multimodal AI
Real-time AI
Full-duplex Interaction

Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.