Prodigy v1.10: Dependencies, relations, audio, video & more

· Source: Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, long

Summary

Prodigy v1.10, a significant update to the modern machine learning annotation tool, introduces a suite of new features designed to accelerate training data creation for developers and data scientists. This release includes a completely new interface for manual relationship and dependency annotation, alongside recipes for tasks like coreference resolution. It also adds new interfaces for annotating audio and video files, supporting speaker diarization and transcription, powered by the pyannote.audio library for model-in-the-loop workflows. The manual image annotation UI is revamped with resizing, freehand shapes, and center-based drawing. Furthermore, NER annotation is enhanced with character-based highlighting and a recipe for fine-tuning Transformer models like BERT, ensuring compatibility with subword tokenization. New recipe callbacks and UI customization options further streamline annotation workflows.

Key takeaway

For data scientists and ML engineers aiming to accelerate training data creation, Prodigy v1.10 provides critical enhancements. You can now efficiently annotate complex dependencies, coreferences, and custom relations with model-in-the-loop assistance. The updated audio, video, and image UIs, alongside Transformer-compatible NER tools, allow you to build high-quality datasets faster. Consider upgrading to leverage these specialized workflows and customizable validation callbacks for improved data consistency.

Key insights

Prodigy v1.10 enhances ML data annotation through specialized UIs, model-in-the-loop automation, and customizable workflows for diverse data types.

Principles

Method

Prodigy's relation annotation method involves defining match patterns to disable irrelevant tokens, using models or patterns to pre-label spans for consistent units, and then manually connecting the relevant tokens or spans.

In practice

Topics

Best for: Machine Learning Engineer, Data Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.