Models as annotators in Prodigy

2023-08-29 · Source: Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, long

Summary

Prodigy, a data annotation tool, has introduced a new feature allowing machine learning models to function as annotators. This enables a powerful pattern for prioritizing data annotation by identifying examples where two distinct models disagree. The demonstration uses a pre-trained spaCy "en_core_web_md" model and a spaCy LLM configured with GPT-3.5 to perform Named Entity Recognition (NER) on New York Times headlines. The "ner.model-annotate" recipe is used to store predictions from both models as annotations. Subsequently, Prodigy's "review" recipe, when combined with the "auto_accept" flag, filters the annotation stream to present only those examples where the two models produced conflicting labels, thereby focusing human annotator effort on the most ambiguous and valuable instances for correction and model improvement.

Key takeaway

For Machine Learning Engineers building or refining models that require extensive data annotation, adopting Prodigy's model-as-annotator pattern can drastically improve efficiency. By configuring two models to annotate data and then using the "review" recipe with "auto_accept", you can focus your human annotation efforts solely on examples where models disagree, accelerating dataset creation. Be mindful that "auto_accept" carries the risk of propagating errors if models consistently agree on incorrect labels.

Key insights

Prioritizing human data annotation by identifying and reviewing examples where two distinct models disagree significantly improves efficiency.

Principles

Model disagreement pinpoints high-value annotation examples.
Any reasonable model can act as an annotator.
Human review is essential for quality and learning.

Method

Use Prodigy's "ner.model-annotate" recipe for two models to generate annotations. Employ the "review" recipe with "auto_accept" to filter and present only disagreed-upon examples for human correction.

In practice

Apply "ner.model-annotate" with diverse models.
Use "review" recipe with "auto_accept" flag.
Explore spaCy LLM prompts for task definition.

Topics

Data Annotation
Active Learning
Named Entity Recognition
Prodigy
spaCy LLM
Model Disagreement

Best for: Machine Learning Engineer, AI Engineer, Data Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.