DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

DiZiNER (Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition) is a novel framework that enhances zero-shot Named Entity Recognition (NER) performance in Large Language Models (LLMs) by simulating human pilot annotation. It employs multiple heterogeneous LLMs as annotators and a supervisor LLM (GPT-5 mini) to analyze inter-model disagreements and iteratively refine task instructions. Across 18 NER benchmarks, DiZiNER achieved zero-shot SOTA results on 14 datasets, improving prior bests by an average of +8.0 F1 and narrowing the zero-shot to supervised performance gap from -32.0 to -20.9 F1 points. The framework consistently outperformed its GPT-5 mini supervisor by +5.0 F1, indicating that performance gains stem from the disagreement-guided instruction refinement rather than the supervisor's inherent capacity. Ablation studies confirmed the importance of annotator diversity and aligning refinement with the final task objective.

Key takeaway

For AI Engineers developing zero-shot NER solutions, DiZiNER offers a robust, fine-tuning-free approach to significantly improve model accuracy. You should consider implementing a disagreement-guided instruction refinement loop with diverse LLM annotators to enhance performance, especially for complex or ambiguous entity types. This method can reduce the performance gap with supervised systems without requiring extensive human-labeled data.

Key insights

Simulating human pilot annotation with LLMs to refine instructions significantly boosts zero-shot NER performance.

Principles

Method

DiZiNER iteratively refines NER instructions by having multiple LLMs cross-annotate texts, analyzing their disagreements to identify hotspots, and then using a supervisor LLM to update common and model-specific instructions.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.