Zero- and Few-Shot Named-Entity Recognition: Case Study and Dataset in the Crime Domain (CrimeNER)
Summary
CrimeNER introduces a case study and dataset for zero- and few-shot Named-Entity Recognition (NER) specifically tailored for crime-related documents. This initiative addresses the significant lack of annotated data in real-world crime scenarios, which is crucial for law enforcement agencies to extract information about crimes, criminals, and involved agencies. The CrimeNERdb dataset comprises over 1,500 annotated documents, sourced from public reports on terrorist attacks and U.S. Department of Justice press notes. It defines 5 coarse crime entity types and 22 fine-grained entity types. The quality of both the case study and the annotated data was validated through experiments utilizing state-of-the-art NER models and commonly used Large Language Models in zero- and few-shot settings.
Key takeaway
For NLP Engineers developing solutions for law enforcement or legal tech, CrimeNERdb offers a critical resource for improving named-entity recognition in crime-related texts. You should consider integrating this dataset to train or fine-tune models, especially when addressing the challenge of limited annotated data for specific crime entities. This can significantly enhance the accuracy of information extraction from incident reports and legal documents.
Key insights
CrimeNER provides a specialized dataset and case study for zero- and few-shot NER in the crime domain.
Principles
- Specialized datasets improve NER in niche domains.
- Zero- and few-shot learning mitigate data scarcity.
Method
CrimeNERdb was created by annotating over 1,500 public crime documents, defining 5 coarse and 22 fine-grained entity types, and validating with SOTA NER models and LLMs.
In practice
- Utilize CrimeNERdb for crime-related information extraction.
- Apply zero-shot NER for new crime entity types.
Topics
- Named-Entity Recognition
- Zero-Shot Learning
- Few-Shot Learning
- Crime Data Annotation
- Large Language Models
Best for: NLP Engineer, AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, AI Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.