Task Routers in Prodigy

· Source: Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Advanced, extended

Summary

Prodigy version 1.12 introduces custom Task Routers, a powerful feature allowing machine learning engineers to define how annotation tasks are distributed among annotators using custom Python code. This addresses the challenge of mapping tasks to a pool of annotators, supporting scenarios from single-annotator assignments to full or partial overlap, and even conditional routing based on specific task properties like language or model confidence scores. Task routers are Python functions integrated into Prodigy recipes, receiving the controller, session ID, and current example, then returning a list of target session IDs. Consistent routing across server restarts can be achieved by pre-defining annotators using the "PRODIGY_ALLOWED_SESSIONS" environment variable and employing a deterministic hashing trick for even task distribution. Prodigy also provides built-in task routers configurable via "prodigy.json" for simpler overlap requirements.

Key takeaway

For Machine Learning Engineers or Data Annotation Leads designing complex annotation workflows, Prodigy's custom task routers in version 1.12 offer critical flexibility. You can implement bespoke Python logic to manage annotator overlap, route tasks based on data attributes (e.g., language), or integrate model confidence scores. To ensure consistent task distribution and avoid imbalances, explicitly define your annotator pool using the "PRODIGY_ALLOWED_SESSIONS" environment variable and consider deterministic hashing for task assignment. This enables precise control over your annotation process.

Key insights

Prodigy's custom task routers enable highly flexible, code-driven control over data annotation task distribution and annotator overlap.

Principles

Method

Define a Python function (controller, session ID, example) returning target session IDs. Use a hashing trick (task hash % pool length) for consistent, even task assignment.

In practice

Topics

Best for: Machine Learning Engineer, AI Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.