Universal-3 Pro Technical Overview
Summary
AssemblyAI has released Universal-3 Pro, a new speech-to-text model that introduces text prompt input for customized transcription outputs. This model allows users to influence transcription style, formatting, entity accuracy, speaker attribution, audio event tags, and code-switching through prompts. An out-of-the-box comparison with Universal-2, AssemblyAI's previous production model, demonstrated Universal-3 Pro's immediate improvements in correcting broken words, capitalizing proper nouns, and fixing sentence meanings without any prompting. Further demonstrations showed how specific prompts could accurately transcribe disfluencies like stutters ("it may") and add verbatim hesitations such as "ums" into the transcript, offering granular control over the output for various use cases. Users can access Universal-3 Pro via the `speech_models` parameter and experiment with the `prompt` parameter in their requests.
Key takeaway
For AI Engineers and NLP Engineers building transcription services, Universal-3 Pro's prompting capabilities offer unprecedented control over output customization. You should explore integrating the `prompt` parameter to fine-tune transcripts for specific domain requirements, such as verbatim capture of hesitations or precise entity recognition, significantly enhancing the utility and accuracy of your applications.
Key insights
Universal-3 Pro offers customizable speech-to-text transcription via text prompts, enhancing accuracy and output control.
Principles
- Prompting customizes transcription output.
- Model improves out-of-the-box accuracy.
- Prompts control disfluencies and formatting.
Method
Users provide audio files and a text prompt to Universal-3 Pro, which then generates a customized transcription. The prompt guides the model on aspects like style, entity recognition, and verbatim output.
In practice
- Use prompts for specific formatting needs.
- Improve entity accuracy with contextual clues.
- Add speaker attribution via prompting.
Topics
- Speech-to-Text
- Prompt Engineering
- Customized Transcription
- Universal-3 Pro
- Disfluency Detection
Best for: AI Engineer, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AssemblyAI.