Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States
Summary
The Local Ordinance Corpus for the United States (LOCUS) is a new, comprehensive dataset addressing the critical absence of machine-readable local laws. This corpus includes codes from 9,239 U.S. cities and counties, with a county-harmonized access layer covering 2,309 of 3,144 counties, representing most of the population. LOCUS employs LightOnOCR-2-1B, a 1B parameter vision-language model, to process diverse PDF formats into Markdown, and uses GPT-5.4-nano for initial classification. ModernBERT-based classifiers and regressors then annotate laws across dimensions like function, topic, opacity, paternalism, enforcement discretion, and salience. Available on Hugging Face, LOCUS-v1 facilitates advanced legal AI research, enabling national-scale retrieval, regulatory extraction, and comparative policy analysis of local regulations.
Key takeaway
For legal AI scientists and research teams developing systems for U.S. regulatory analysis, LOCUS-v1 offers an unprecedented, machine-readable corpus of local ordinances. You should integrate this dataset to build more sophisticated models capable of navigating layered legal authority and performing national-scale comparative policy analysis. Be aware that while LOCUS provides a crucial geographic substrate, your systems must still reason about specific jurisdictional control and state-local overlaps.
Key insights
LOCUS provides a national, machine-readable corpus of U.S. local ordinances, enabling large-scale legal AI research.
Principles
- Local law is a layered system requiring reasoning about jurisdictional overlap.
- Local codes exhibit a recurring documentary form and functional division.
- Harmonization of legal data must be explicit about its abstractions.
Method
Collect PDFs via browser automation, OCR to Markdown using LightOnOCR-2-1B, segment, then classify (GPT-5.4-nano) and score (ModernBERT regressors trained on LLM-as-a-Judge pairwise comparisons).
In practice
- Perform search and Q&A over local rules with varying terminology.
- Extract regulated activities, permits, fees, and penalties.
- Develop benchmarks for multi-layered legal reasoning systems.
Topics
- Local Ordinances
- Legal AI
- Corpus Development
- Optical Character Recognition
- ModernBERT
- Regulatory Analysis
Best for: NLP Engineer, AI Scientist, Research Scientist, Legal Professional
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.