Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States
Summary
LOCUS, the Local Ordinance Corpus for the United States, is a new comprehensive dataset designed to address the critical lack of machine-readable local legal texts. This corpus makes nearly all publicly available municipal and county ordinance codes accessible for legal AI research, overcoming the fragmentation across vendor platforms. The raw corpus includes codes from 9,239 cities and counties. A smaller, county-harmonized LOCUS access layer covers 2,309 of the 3,144 U.S. counties, representing a majority of the population. The project utilizes OCR to process diverse document formats, ensuring the law becomes a public resource. Alongside the corpus, a collection of ModernBERT-based classifiers and scorers are trained to analyze U.S. local law across dimensions like opacity and paternalism, enabling large-scale studies previously impossible. LOCUS-v1 and its derivative models are available on Hugging Face.
Key takeaway
For legal AI scientists and NLP engineers developing models for U.S. law, LOCUS fundamentally changes your access to local ordinance data. You can now utilize a comprehensive, machine-readable corpus covering 9,239 cities and counties, previously unavailable at scale. This enables you to train and evaluate models on critical local regulations, fostering new research into legal dimensions like opacity and paternalism. Leverage the provided ModernBERT-based classifiers to accelerate your analysis and expand the scope of your legal AI applications.
Key insights
Local legal ordinances, critical for daily life, are now machine-readable at scale through LOCUS.
Principles
- Fragmented legal data hinders AI progress.
- OCR can unify diverse legal document formats.
- Corpus metadata supports reproducibility.
Method
The LOCUS project collects nearly all publicly available U.S. municipal and county ordinance codes, uses OCR for diverse formats, and trains ModernBERT-based classifiers to analyze legal dimensions like opacity and paternalism.
In practice
- Access LOCUS-v1 on Hugging Face.
- Use ModernBERT models for legal analysis.
- Expand machine-readable local law access.
Topics
- Legal AI
- Local Ordinances
- Legal Corpus
- Natural Language Processing
- ModernBERT
- Optical Character Recognition
Best for: Research Scientist, AI Scientist, NLP Engineer, Legal Professional
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.