Zero- and Few-Shot Named-Entity Recognition: Case Study and Dataset in the Crime Domain (CrimeNER)

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, quick

Summary

CrimeNER introduces a case study and dataset for zero- and few-shot Named-Entity Recognition (NER) specifically tailored for crime-related documents. This initiative addresses the significant lack of annotated data in real-world crime scenarios, which is crucial for law enforcement agencies to extract information about crimes, criminals, and involved agencies. The CrimeNERdb dataset comprises over 1,500 annotated documents, sourced from public reports on terrorist attacks and U.S. Department of Justice press notes. It defines 5 coarse crime entity types and 22 fine-grained entity types. The quality of both the case study and the annotated data was validated through experiments utilizing state-of-the-art NER models and commonly used Large Language Models in zero- and few-shot settings.

Key takeaway

For NLP Engineers developing solutions for law enforcement or legal tech, CrimeNERdb offers a critical resource for improving named-entity recognition in crime-related texts. You should consider integrating this dataset to train or fine-tune models, especially when addressing the challenge of limited annotated data for specific crime entities. This can significantly enhance the accuracy of information extraction from incident reports and legal documents.

Key insights

CrimeNER provides a specialized dataset and case study for zero- and few-shot NER in the crime domain.

Principles

Method

CrimeNERdb was created by annotating over 1,500 public crime documents, defining 5 coarse and 22 fine-grained entity types, and validating with SOTA NER models and LLMs.

In practice

Topics

Best for: NLP Engineer, AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, AI Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.