Deepgram speech-to-text and voice models now available natively on Together AI

· Source: Together AI | The AI Native Cloud - Together.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

Deepgram's production speech-to-text (STT) and text-to-speech (TTS) models, including Nova-3, Nova-3 Multilingual, Flux, and Aura-2, are now natively available on Together AI's Dedicated Model Inference platform as of April 2, 2026. This integration allows teams to deploy a complete real-time voice agent pipeline, combining Deepgram's transcription and synthesis with any Large Language Model (LLM) from Together AI's catalog on a single production surface. Key models include Flux, designed for conversational STT with 250ms end-of-turn detection to manage interruptions and turn-taking, and Nova-3, which offers production transcription for complex real-world audio with vocabulary customization. Aura-2 provides enterprise-grade TTS for clear and consistent voice agents. The platform offers dedicated GPU capacity, a 9% uptime SLA, SOC 2 Type II, HIPAA-ready support, and data residency options, streamlining operations for use cases like contact centers, healthcare, and financial services.

Key takeaway

For MLOps Engineers building real-time voice agents, integrating Deepgram's STT and TTS models on Together AI simplifies your production stack. You can now run transcription, LLM reasoning, and synthesis on a single platform, significantly reducing latency and operational fragility often caused by multi-vendor setups. This unified approach, with features like 250ms end-of-turn detection and vocabulary customization, helps you deliver more natural and reliable conversational experiences, especially in regulated environments requiring SOC 2 Type II or HIPAA compliance.

Key insights

Real-time voice agents require integrated STT, LLM, and TTS on a single platform to minimize latency and operational complexity.

Principles

Method

Deploy Deepgram STT/TTS models (Flux, Nova-3, Aura-2) alongside LLMs on Together AI's Dedicated Model Inference for a unified voice pipeline.

In practice

Topics

Best for: Machine Learning Engineer, CTO, VP of Engineering/Data, AI Engineer, MLOps Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Together AI | The AI Native Cloud - Together.ai.