DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition
Summary
DiZiNER (Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition) is a new framework designed to improve zero-shot Named Entity Recognition (NER) performance in large language models (LLMs). It addresses persistent errors in LLM generative outputs by simulating a human pilot annotation process. The framework uses multiple heterogeneous LLMs to annotate shared texts, with a supervisor model then analyzing inter-model disagreements to refine task instructions. Evaluated across 18 benchmarks, DiZiNER achieved zero-shot state-of-the-art results on 14 datasets, improving prior bests by +8.0 F1 and reducing the zero-shot to supervised gap by over +11 points. Notably, DiZiNER consistently outperformed its supervisor, GPT-5 mini, suggesting that performance gains are due to the disagreement-guided instruction refinement rather than just increased model capacity.
Key takeaway
For AI Engineers developing zero-shot NER systems, adopting a disagreement-guided instruction refinement approach like DiZiNER can drastically improve model accuracy. You should consider deploying multiple LLMs as "annotators" and a separate LLM as a "supervisor" to iteratively refine your task instructions, potentially closing the performance gap with supervised systems by over 11 points.
Key insights
Simulating human pilot annotation with LLMs to refine instructions significantly boosts zero-shot NER performance.
Principles
- Inter-model disagreement analysis refines instructions.
- Heterogeneous LLMs improve annotation robustness.
Method
Multiple LLMs annotate shared texts; a supervisor LLM analyzes disagreements to iteratively refine task instructions, mimicking human pilot annotation for zero-shot NER.
In practice
- Use multiple LLMs for initial annotation.
- Implement a supervisor model for disagreement analysis.
Topics
- DiZiNER Framework
- Zero-shot Named Entity Recognition
- Large Language Models
- Instruction Refinement
- Pilot Annotation Simulation
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.