Prompt Engineering for Named Entity Extraction from Portuguese Legal Documents
Summary
A study investigated prompt engineering for Named Entity Recognition (NER) in Portuguese legal documents, addressing the scarcity and cost of annotated legal data. The research explored whether Large Language Models (LLMs) and In-Context Learning (ICL) could effectively support legal NER in low-supervision and low-resource environments. Utilizing the LeNER-Br corpus, the evaluation focused on category-specific prompts, varying chunking sizes, and different prompt engineering strategies. Entity-level evaluation, using Exact Match Micro F1, revealed that prompt engineering significantly influenced performance more than other tested strategies. The highest scores were achieved by larger models, specifically the 4-bit quantized Qwen-2.5:32B and GPT-5.2, which attained 57.9% and 71.9% respectively, demonstrating the potential of this method as an alternative to conventional supervised NER.
Key takeaway
For research scientists developing NER solutions for low-resource languages like Portuguese, you should investigate prompt engineering with larger, quantized LLMs as a strong alternative to traditional supervised pipelines. Focusing on refining prompt strategies can yield significant performance gains, potentially reducing the reliance on extensive, costly annotated datasets and accelerating development.
Key insights
Prompt engineering with LLMs offers a viable alternative for legal NER in low-resource settings.
Principles
- Prompt engineering impacts NER performance more than chunking.
- Larger LLMs generally yield better NER results.
- ICL can mitigate data scarcity in legal text analysis.
Method
The study evaluated category-specific prompts, chunking sizes, and prompt engineering strategies using LLMs and In-Context Learning on the LeNER-Br corpus for legal NER.
In practice
- Consider 4-bit quantized LLMs for legal NER.
- Prioritize prompt engineering over chunking size.
- Explore ICL for low-supervision NER tasks.
Topics
- Prompt Engineering
- Named Entity Recognition
- Large Language Models
- Portuguese Legal Documents
- In-Context Learning
Best for: Research Scientist, AI Scientist, NLP Engineer, Prompt Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.