Democratizing Legal Analytics: Resource-Efficient Information Extraction for Brazilian Case Law

2026-04-12 · Source: Paper Index on ACL Anthology · Field: Legal & Regulatory — Legal Technology (LegalTech), Criminal Law & Public Safety · Depth: Advanced, quick

Summary

A resource-efficient pipeline has been developed for information extraction from Brazilian criminal case law, specifically designed to address the challenges of large volumes of unstructured legal decisions in low-resource languages like Portuguese. The pipeline reuses a legacy dataset to fine-tune open-weight Large Language Models (LLMs) using Q-LoRA, operating in a small-data setting. It extracts 47 legal variables, including charges, evidence, and sentencing outcomes, through schema-constrained JSON generation. In evaluations, a fine-tuned Phi-4 (14B) model achieved 92.8% accuracy and a 0.826 macro-F1 score, demonstrating performance comparable to proprietary baselines while offering cost and privacy advantages through local deployment. The extracted data was then used in a case study examining the short-term effects of a Brazilian Supreme Court ruling on drug decriminalization, which found no statistically significant change in trafficking-conviction rates (p≥0.05).

Key takeaway

For AI Scientists developing legal analytics solutions in low-resource language contexts, this work demonstrates a viable path to high-performance, cost-effective information extraction. You should consider fine-tuning open-weight LLMs with Q-LoRA on existing, even legacy, datasets to achieve competitive accuracy while maintaining local deployment benefits. This approach can significantly reduce the barriers to large-scale empirical legal analysis.

Key insights

Resource-efficient LLM fine-tuning enables scalable legal analytics in low-resource languages using legacy datasets.

Principles

Legacy datasets can support scalable analytics.
Local deployment offers cost and privacy benefits.

Method

The method involves fine-tuning open-weight LLMs with Q-LoRA on legacy datasets in a small-data setting, followed by schema-constrained JSON generation to extract legal variables.

In practice

Reuse existing datasets for LLM fine-tuning.
Employ Q-LoRA for resource-efficient training.
Utilize schema-constrained JSON for extraction.

Topics

Legal Analytics
Information Extraction
Brazilian Criminal Law
Low-Resource NLP
Q-LoRA Fine-tuning

Best for: AI Scientist, NLP Engineer, Research Scientist, Legal Professional

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.