Prodigy v1.10: Dependencies, relations, audio, video & more
Summary
Prodigy v1.10, a significant update to the modern machine learning annotation tool, introduces a suite of new features designed to accelerate training data creation for developers and data scientists. This release includes a completely new interface for manual relationship and dependency annotation, alongside recipes for tasks like coreference resolution. It also adds new interfaces for annotating audio and video files, supporting speaker diarization and transcription, powered by the pyannote.audio library for model-in-the-loop workflows. The manual image annotation UI is revamped with resizing, freehand shapes, and center-based drawing. Furthermore, NER annotation is enhanced with character-based highlighting and a recipe for fine-tuning Transformer models like BERT, ensuring compatibility with subword tokenization. New recipe callbacks and UI customization options further streamline annotation workflows.
Key takeaway
For data scientists and ML engineers aiming to accelerate training data creation, Prodigy v1.10 provides critical enhancements. You can now efficiently annotate complex dependencies, coreferences, and custom relations with model-in-the-loop assistance. The updated audio, video, and image UIs, alongside Transformer-compatible NER tools, allow you to build high-quality datasets faster. Consider upgrading to leverage these specialized workflows and customizable validation callbacks for improved data consistency.
Key insights
Prodigy v1.10 enhances ML data annotation through specialized UIs, model-in-the-loop automation, and customizable workflows for diverse data types.
Principles
- Automate machine-capable annotation tasks.
- Define patterns to disable irrelevant tokens.
- Ensure data consistency via annotation rules.
Method
Prodigy's relation annotation method involves defining match patterns to disable irrelevant tokens, using models or patterns to pre-label spans for consistent units, and then manually connecting the relevant tokens or spans.
In practice
- Use dep.correct or coref.manual for NLP relation tasks.
- Integrate pyannote.audio for model-assisted audio annotation.
- Employ image_manual_from_center for faster image bounding box drawing.
Topics
- Machine Learning Annotation
- Natural Language Processing
- Audio/Video Annotation
- Image Annotation
- Data Labeling Tools
- Transformer Fine-tuning
Best for: Machine Learning Engineer, Data Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.