Prompt Engineering Workshop: Universal-3 Pro
Summary
AssemblyAI introduced Universal 3 Pro, a new promptable speech-to-text model that allows users to customize transcription results using natural language prompts. This model, which costs an additional $0.05 per hour on top of the $0.21 per hour list price, offers enhanced accuracy and contextual understanding compared to its predecessor, Universal 2. The workshop demonstrated various prompting techniques, including preserving linguistic speech patterns (disfluencies, filler words, hesitations, repetitions, stutters, false starts, colloquialisms), controlling the model's guessing behavior, handling code-switching and mixed languages (English, Spanish, French, German, Italian, Portuguese), and marking unclear audio segments. The session also touched on PII redaction, speaker diarization (currently experimental via prompting), and tools for iterative prompt optimization, such as a command-line tool for data set simulations and a Prompt Repair Wizard.
Key takeaway
For AI Engineers building transcription workflows, Universal 3 Pro's promptability changes how you achieve specific output requirements. You should experiment with explicit, authoritative prompts to control linguistic patterns, code-switching, and confidence thresholds, potentially reducing post-processing. Leverage the Prompt Repair Wizard and iterative evaluation tools to optimize prompts for your unique datasets, recognizing that human-labeled ground truths may require re-evaluation against this advanced model.
Key insights
Universal 3 Pro offers customizable speech-to-text via natural language prompts, enhancing transcription accuracy and control.
Principles
- Authoritative language in prompts yields better model adherence.
- Specificity in prompts improves model interpretation and output.
- Model context awareness reduces the need for explicit domain context.
Method
Iterative prompt engineering involves testing various prompt components (e.g., disfluencies, code-switching) and their styles (short, medium, long) against ground truth data to optimize Word Error Rate (WER) or semantic WER.
In practice
- Use "mandatory" or "always" in prompts for stronger model compliance.
- Specify filler words (e.g., "um," "uh") for verbatim transcripts.
- Instruct the model to "preserve original languages" for mixed-language audio.
Topics
- Prompt Engineering
- Speech-to-Text Models
- Universal 3 Pro
- Multilingual Transcription
- Audio Tagging
Best for: Prompt Engineer, AI Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AssemblyAI.