LegalBench-BR: A Benchmark for Evaluating Large Language Models on Brazilian Legal Decision Classification

2026-04-20 · Source: Computation and Language · Field: Legal & Regulatory — Legal Technology (LegalTech) · Depth: Expert, quick

Summary

LegalBench-BR is introduced as the first public benchmark for evaluating language models on Brazilian legal text classification. This dataset contains 3,105 appellate proceedings from the Santa Catarina State Court (TJSC), gathered using the DataJud API (CNJ) and annotated across five legal areas with LLM-assisted labeling and heuristic validation. On a class-balanced test set, BERTimbau-LoRA, fine-tuned with only 0.3% of its parameters updated, achieved 87.6% accuracy and a 0.87 macro-F1 score. This performance significantly surpasses commercial models like Claude 3.5 Haiku (+22pp) and GPT-4o mini (+28pp). Notably, on "administrativo" (administrative law), GPT-4o mini scored F1 = 0.00 and Claude 3.5 Haiku scored F1 = 0.08, while the fine-tuned model reached F1 = 0.91. Commercial LLMs showed a bias towards "civel" (civil law), misclassifying ambiguous cases, a problem resolved by domain-adapted fine-tuning.

Key takeaway

For AI Engineers developing legal tech solutions in Brazil, relying solely on general-purpose LLMs like GPT-4o mini or Claude 3.5 Haiku for legal classification tasks is insufficient. You should instead implement LoRA fine-tuning on models like BERTimbau using domain-specific datasets such as LegalBench-BR to achieve superior accuracy and eliminate systematic classification biases, even for simple 5-class problems, at zero marginal inference cost.

Key insights

Domain-adapted fine-tuning significantly outperforms general LLMs for specialized legal text classification in Portuguese.

Principles

General LLMs struggle with domain-specific nuances.
LoRA fine-tuning is effective for domain adaptation.

Method

The method involves collecting legal proceedings, LLM-assisted labeling with heuristic validation, and then applying LoRA fine-tuning to a base model like BERTimbau for domain adaptation.

In practice

Use LoRA for legal domain adaptation.
Prioritize domain-specific fine-tuning over general LLMs.

Topics

LegalBench-BR
Brazilian Legal Classification
Large Language Models
BERTimbau-LoRA
LoRA Fine-tuning

Best for: AI Engineer, Machine Learning Engineer, Research Scientist, AI Scientist, NLP Engineer, Legal Professional

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.