Models as annotators in Prodigy
Summary
Prodigy, a data annotation tool, has introduced a new feature allowing machine learning models to function as annotators. This enables a powerful pattern for prioritizing data annotation by identifying examples where two distinct models disagree. The demonstration uses a pre-trained spaCy "en_core_web_md" model and a spaCy LLM configured with GPT-3.5 to perform Named Entity Recognition (NER) on New York Times headlines. The "ner.model-annotate" recipe is used to store predictions from both models as annotations. Subsequently, Prodigy's "review" recipe, when combined with the "auto_accept" flag, filters the annotation stream to present only those examples where the two models produced conflicting labels, thereby focusing human annotator effort on the most ambiguous and valuable instances for correction and model improvement.
Key takeaway
For Machine Learning Engineers building or refining models that require extensive data annotation, adopting Prodigy's model-as-annotator pattern can drastically improve efficiency. By configuring two models to annotate data and then using the "review" recipe with "auto_accept", you can focus your human annotation efforts solely on examples where models disagree, accelerating dataset creation. Be mindful that "auto_accept" carries the risk of propagating errors if models consistently agree on incorrect labels.
Key insights
Prioritizing human data annotation by identifying and reviewing examples where two distinct models disagree significantly improves efficiency.
Principles
- Model disagreement pinpoints high-value annotation examples.
- Any reasonable model can act as an annotator.
- Human review is essential for quality and learning.
Method
Use Prodigy's "ner.model-annotate" recipe for two models to generate annotations. Employ the "review" recipe with "auto_accept" to filter and present only disagreed-upon examples for human correction.
In practice
- Apply "ner.model-annotate" with diverse models.
- Use "review" recipe with "auto_accept" flag.
- Explore spaCy LLM prompts for task definition.
Topics
- Data Annotation
- Active Learning
- Named Entity Recognition
- Prodigy
- spaCy LLM
- Model Disagreement
Best for: Machine Learning Engineer, AI Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.