Prodigy v1.12: OpenAI integration, prompt engineering, task routers, deployment docs and more

2023-07-05 · Source: Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Intermediate, extended

Summary

Prodigy v1.12 introduces significant enhancements, including robust OpenAI integration for large language model (LLM) workflows. New recipes like "ner.openai.correct" and "ner.openai.fetch" enable LLM-powered pre-annotation for named entity recognition and text categorization, supporting few-shot examples and bulk fetching. Prompt engineering is streamlined with "ab.openai.prompts" for A/B testing and a tournament recipe for multiple prompts, alongside "terms.openai.fetch" for generating domain-specific term lists. The release also features new task routers, allowing flexible distribution of annotation tasks among multiple annotators based on overlap requirements or custom logic, crucial for managing disagreement. A backend refactor introduces new abstractions and improves progress bar functionality, distinguishing between source-based and target-based progress. Additional updates include a deployment guide, Parquet file input, a "filter_by_patterns" recipe, co-reference model support in the "train" command, and memory-efficient Python functions like "iter_dataset_examples".

Key takeaway

For Machine Learning Engineers and Prompt Engineers optimizing data annotation or LLM outputs, Prodigy v1.12 offers critical efficiency gains. You should integrate the new OpenAI recipes to utilize LLMs for pre-annotation and systematically A/B test your prompts. Additionally, utilize the flexible task routers to streamline multi-annotator projects, ensuring optimal task distribution and effective management of annotator disagreement. This update allows you to accelerate model training and refine LLM interactions more effectively.

Key insights

Prodigy v1.12 integrates LLMs and advanced task routing to accelerate data annotation and prompt engineering workflows.

Principles

LLMs can pre-annotate data, reducing manual effort.
A/B testing prompts improves LLM output quality.
Task routers manage annotator workload and disagreement.

Method

Prodigy recipes like "ner.openai.correct" pre-annotate data using LLMs. Prompt engineering involves A/B testing prompts via Jinja templates. Task routers distribute examples based on "annotations_per_task" or custom Python logic.

In practice

Use "ner.openai.correct" for LLM-assisted NER pre-annotation.
Employ "ab.openai.prompts" to compare different prompt versions.
Configure "annotations_per_task" for multi-annotator projects.

Topics

Prodigy
Data Annotation
OpenAI Integration
Prompt Engineering
Task Routers
Named Entity Recognition

Best for: AI Engineer, NLP Engineer, Machine Learning Engineer, Prompt Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Explosion · Developer tools and consulting for AI, Machine Learning and NLP - Explosion.ai.