Prodigy v1.10: Dependencies, relations, audio, video & more

2020-06-17 · Source: Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, long

Summary

Prodigy v1.10, a significant update to the modern machine learning annotation tool, introduces a suite of new features designed to accelerate training data creation for developers and data scientists. This release includes a completely new interface for manual relationship and dependency annotation, alongside recipes for tasks like coreference resolution. It also adds new interfaces for annotating audio and video files, supporting speaker diarization and transcription, powered by the pyannote.audio library for model-in-the-loop workflows. The manual image annotation UI is revamped with resizing, freehand shapes, and center-based drawing. Furthermore, NER annotation is enhanced with character-based highlighting and a recipe for fine-tuning Transformer models like BERT, ensuring compatibility with subword tokenization. New recipe callbacks and UI customization options further streamline annotation workflows.

Key takeaway

For data scientists and ML engineers aiming to accelerate training data creation, Prodigy v1.10 provides critical enhancements. You can now efficiently annotate complex dependencies, coreferences, and custom relations with model-in-the-loop assistance. The updated audio, video, and image UIs, alongside Transformer-compatible NER tools, allow you to build high-quality datasets faster. Consider upgrading to leverage these specialized workflows and customizable validation callbacks for improved data consistency.

Key insights

Prodigy v1.10 enhances ML data annotation through specialized UIs, model-in-the-loop automation, and customizable workflows for diverse data types.

Principles

Automate machine-capable annotation tasks.
Define patterns to disable irrelevant tokens.
Ensure data consistency via annotation rules.

Method

Prodigy's relation annotation method involves defining match patterns to disable irrelevant tokens, using models or patterns to pre-label spans for consistent units, and then manually connecting the relevant tokens or spans.

In practice

Use dep.correct or coref.manual for NLP relation tasks.
Integrate pyannote.audio for model-assisted audio annotation.
Employ image_manual_from_center for faster image bounding box drawing.

Topics

Machine Learning Annotation
Natural Language Processing
Audio/Video Annotation
Image Annotation
Data Labeling Tools
Transformer Fine-tuning

Best for: Machine Learning Engineer, Data Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.