LegalBench-BR: A Benchmark for Evaluating Large Language Models on Brazilian Legal Decision Classification
Summary
LegalBench-BR is introduced as the first public benchmark for evaluating language models on Brazilian legal text classification. This dataset contains 3,105 appellate proceedings from the Santa Catarina State Court (TJSC), gathered using the DataJud API (CNJ) and annotated across five legal areas with LLM-assisted labeling and heuristic validation. On a class-balanced test set, BERTimbau-LoRA, fine-tuned with only 0.3% of its parameters updated, achieved 87.6% accuracy and a 0.87 macro-F1 score. This performance significantly surpasses commercial models like Claude 3.5 Haiku (+22pp) and GPT-4o mini (+28pp). Notably, on "administrativo" (administrative law), GPT-4o mini scored F1 = 0.00 and Claude 3.5 Haiku scored F1 = 0.08, while the fine-tuned model reached F1 = 0.91. Commercial LLMs showed a bias towards "civel" (civil law), misclassifying ambiguous cases, a problem resolved by domain-adapted fine-tuning.
Key takeaway
For AI Engineers developing legal tech solutions in Brazil, relying solely on general-purpose LLMs like GPT-4o mini or Claude 3.5 Haiku for legal classification tasks is insufficient. You should instead implement LoRA fine-tuning on models like BERTimbau using domain-specific datasets such as LegalBench-BR to achieve superior accuracy and eliminate systematic classification biases, even for simple 5-class problems, at zero marginal inference cost.
Key insights
Domain-adapted fine-tuning significantly outperforms general LLMs for specialized legal text classification in Portuguese.
Principles
- General LLMs struggle with domain-specific nuances.
- LoRA fine-tuning is effective for domain adaptation.
Method
The method involves collecting legal proceedings, LLM-assisted labeling with heuristic validation, and then applying LoRA fine-tuning to a base model like BERTimbau for domain adaptation.
In practice
- Use LoRA for legal domain adaptation.
- Prioritize domain-specific fine-tuning over general LLMs.
Topics
- LegalBench-BR
- Brazilian Legal Classification
- Large Language Models
- BERTimbau-LoRA
- LoRA Fine-tuning
Best for: AI Engineer, Machine Learning Engineer, Research Scientist, AI Scientist, NLP Engineer, Legal Professional
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.