Deepgram speech-to-text and voice models now available natively on Together AI
Summary
Deepgram's production speech-to-text (STT) and text-to-speech (TTS) models, including Nova-3, Nova-3 Multilingual, Flux, and Aura-2, are now natively available on Together AI's Dedicated Model Inference platform as of April 2, 2026. This integration allows teams to deploy a complete real-time voice agent pipeline, combining Deepgram's transcription and synthesis with any Large Language Model (LLM) from Together AI's catalog on a single production surface. Key models include Flux, designed for conversational STT with 250ms end-of-turn detection to manage interruptions and turn-taking, and Nova-3, which offers production transcription for complex real-world audio with vocabulary customization. Aura-2 provides enterprise-grade TTS for clear and consistent voice agents. The platform offers dedicated GPU capacity, a 9% uptime SLA, SOC 2 Type II, HIPAA-ready support, and data residency options, streamlining operations for use cases like contact centers, healthcare, and financial services.
Key takeaway
For MLOps Engineers building real-time voice agents, integrating Deepgram's STT and TTS models on Together AI simplifies your production stack. You can now run transcription, LLM reasoning, and synthesis on a single platform, significantly reducing latency and operational fragility often caused by multi-vendor setups. This unified approach, with features like 250ms end-of-turn detection and vocabulary customization, helps you deliver more natural and reliable conversational experiences, especially in regulated environments requiring SOC 2 Type II or HIPAA compliance.
Key insights
Real-time voice agents require integrated STT, LLM, and TTS on a single platform to minimize latency and operational complexity.
Principles
- Conversational STT needs turn detection, not just transcription.
- Production audio demands robust models for noise and accents.
- Enterprise TTS requires clarity for structured information.
Method
Deploy Deepgram STT/TTS models (Flux, Nova-3, Aura-2) alongside LLMs on Together AI's Dedicated Model Inference for a unified voice pipeline.
In practice
- Use Flux for conversational turn-taking in voice agents.
- Customize Nova-3 vocabulary for domain-specific terms.
- Leverage Aura-2 for consistent, clear patient-facing output.
Topics
- Speech-to-Text
- Text-to-Speech
- Real-time Voice Agents
- Together AI
- Deepgram
- Conversational AI
- MLOps Infrastructure
Best for: Machine Learning Engineer, CTO, VP of Engineering/Data, AI Engineer, MLOps Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Together AI | The AI Native Cloud - Together.ai.