Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

· Source: cs.CL updates on arXiv.org · Field: Legal & Regulatory — Legal Technology (LegalTech), Regulatory Affairs & Government Relations · Depth: Expert, extended

Summary

The Local Ordinance Corpus for the United States (LOCUS) is a new, comprehensive dataset addressing the critical absence of machine-readable local laws. This corpus includes codes from 9,239 U.S. cities and counties, with a county-harmonized access layer covering 2,309 of 3,144 counties, representing most of the population. LOCUS employs LightOnOCR-2-1B, a 1B parameter vision-language model, to process diverse PDF formats into Markdown, and uses GPT-5.4-nano for initial classification. ModernBERT-based classifiers and regressors then annotate laws across dimensions like function, topic, opacity, paternalism, enforcement discretion, and salience. Available on Hugging Face, LOCUS-v1 facilitates advanced legal AI research, enabling national-scale retrieval, regulatory extraction, and comparative policy analysis of local regulations.

Key takeaway

For legal AI scientists and research teams developing systems for U.S. regulatory analysis, LOCUS-v1 offers an unprecedented, machine-readable corpus of local ordinances. You should integrate this dataset to build more sophisticated models capable of navigating layered legal authority and performing national-scale comparative policy analysis. Be aware that while LOCUS provides a crucial geographic substrate, your systems must still reason about specific jurisdictional control and state-local overlaps.

Key insights

LOCUS provides a national, machine-readable corpus of U.S. local ordinances, enabling large-scale legal AI research.

Principles

Method

Collect PDFs via browser automation, OCR to Markdown using LightOnOCR-2-1B, segment, then classify (GPT-5.4-nano) and score (ModernBERT regressors trained on LLM-as-a-Judge pairwise comparisons).

In practice

Topics

Best for: NLP Engineer, AI Scientist, Research Scientist, Legal Professional

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.