Prompt Engineering Workshop: Universal-3 Pro

· Source: AssemblyAI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, extended

Summary

AssemblyAI introduced Universal 3 Pro, a new promptable speech-to-text model that allows users to customize transcription results using natural language prompts. This model, which costs an additional $0.05 per hour on top of the $0.21 per hour list price, offers enhanced accuracy and contextual understanding compared to its predecessor, Universal 2. The workshop demonstrated various prompting techniques, including preserving linguistic speech patterns (disfluencies, filler words, hesitations, repetitions, stutters, false starts, colloquialisms), controlling the model's guessing behavior, handling code-switching and mixed languages (English, Spanish, French, German, Italian, Portuguese), and marking unclear audio segments. The session also touched on PII redaction, speaker diarization (currently experimental via prompting), and tools for iterative prompt optimization, such as a command-line tool for data set simulations and a Prompt Repair Wizard.

Key takeaway

For AI Engineers building transcription workflows, Universal 3 Pro's promptability changes how you achieve specific output requirements. You should experiment with explicit, authoritative prompts to control linguistic patterns, code-switching, and confidence thresholds, potentially reducing post-processing. Leverage the Prompt Repair Wizard and iterative evaluation tools to optimize prompts for your unique datasets, recognizing that human-labeled ground truths may require re-evaluation against this advanced model.

Key insights

Universal 3 Pro offers customizable speech-to-text via natural language prompts, enhancing transcription accuracy and control.

Principles

Method

Iterative prompt engineering involves testing various prompt components (e.g., disfluencies, code-switching) and their styles (short, medium, long) against ground truth data to optimize Word Error Rate (WER) or semantic WER.

In practice

Topics

Best for: Prompt Engineer, AI Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AssemblyAI.