Universal-3 Pro Technical Overview

2026-02-03 · Source: AssemblyAI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

AssemblyAI has released Universal-3 Pro, a new speech-to-text model that introduces text prompt input for customized transcription outputs. This model allows users to influence transcription style, formatting, entity accuracy, speaker attribution, audio event tags, and code-switching through prompts. An out-of-the-box comparison with Universal-2, AssemblyAI's previous production model, demonstrated Universal-3 Pro's immediate improvements in correcting broken words, capitalizing proper nouns, and fixing sentence meanings without any prompting. Further demonstrations showed how specific prompts could accurately transcribe disfluencies like stutters ("it may") and add verbatim hesitations such as "ums" into the transcript, offering granular control over the output for various use cases. Users can access Universal-3 Pro via the `speech_models` parameter and experiment with the `prompt` parameter in their requests.

Key takeaway

For AI Engineers and NLP Engineers building transcription services, Universal-3 Pro's prompting capabilities offer unprecedented control over output customization. You should explore integrating the `prompt` parameter to fine-tune transcripts for specific domain requirements, such as verbatim capture of hesitations or precise entity recognition, significantly enhancing the utility and accuracy of your applications.

Key insights

Universal-3 Pro offers customizable speech-to-text transcription via text prompts, enhancing accuracy and output control.

Principles

Prompting customizes transcription output.
Model improves out-of-the-box accuracy.
Prompts control disfluencies and formatting.

Method

Users provide audio files and a text prompt to Universal-3 Pro, which then generates a customized transcription. The prompt guides the model on aspects like style, entity recognition, and verbatim output.

In practice

Use prompts for specific formatting needs.
Improve entity accuracy with contextual clues.
Add speaker attribution via prompting.

Topics

Speech-to-Text
Prompt Engineering
Customized Transcription
Universal-3 Pro
Disfluency Detection

Best for: AI Engineer, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AssemblyAI.