Democratizing Legal Analytics: Resource-Efficient Information Extraction for Brazilian Case Law
Summary
A resource-efficient pipeline has been developed for information extraction from Brazilian criminal case law, specifically designed to address the challenges of large volumes of unstructured legal decisions in low-resource languages like Portuguese. The pipeline reuses a legacy dataset to fine-tune open-weight Large Language Models (LLMs) using Q-LoRA, operating in a small-data setting. It extracts 47 legal variables, including charges, evidence, and sentencing outcomes, through schema-constrained JSON generation. In evaluations, a fine-tuned Phi-4 (14B) model achieved 92.8% accuracy and a 0.826 macro-F1 score, demonstrating performance comparable to proprietary baselines while offering cost and privacy advantages through local deployment. The extracted data was then used in a case study examining the short-term effects of a Brazilian Supreme Court ruling on drug decriminalization, which found no statistically significant change in trafficking-conviction rates (p≥0.05).
Key takeaway
For AI Scientists developing legal analytics solutions in low-resource language contexts, this work demonstrates a viable path to high-performance, cost-effective information extraction. You should consider fine-tuning open-weight LLMs with Q-LoRA on existing, even legacy, datasets to achieve competitive accuracy while maintaining local deployment benefits. This approach can significantly reduce the barriers to large-scale empirical legal analysis.
Key insights
Resource-efficient LLM fine-tuning enables scalable legal analytics in low-resource languages using legacy datasets.
Principles
- Legacy datasets can support scalable analytics.
- Local deployment offers cost and privacy benefits.
Method
The method involves fine-tuning open-weight LLMs with Q-LoRA on legacy datasets in a small-data setting, followed by schema-constrained JSON generation to extract legal variables.
In practice
- Reuse existing datasets for LLM fine-tuning.
- Employ Q-LoRA for resource-efficient training.
- Utilize schema-constrained JSON for extraction.
Topics
- Legal Analytics
- Information Extraction
- Brazilian Criminal Law
- Low-Resource NLP
- Q-LoRA Fine-tuning
Best for: AI Scientist, NLP Engineer, Research Scientist, Legal Professional
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.