Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

LOCUS, the Local Ordinance Corpus for the United States, is a new comprehensive dataset designed to address the critical lack of machine-readable local legal texts. This corpus makes nearly all publicly available municipal and county ordinance codes accessible for legal AI research, overcoming the fragmentation across vendor platforms. The raw corpus includes codes from 9,239 cities and counties. A smaller, county-harmonized LOCUS access layer covers 2,309 of the 3,144 U.S. counties, representing a majority of the population. The project utilizes OCR to process diverse document formats, ensuring the law becomes a public resource. Alongside the corpus, a collection of ModernBERT-based classifiers and scorers are trained to analyze U.S. local law across dimensions like opacity and paternalism, enabling large-scale studies previously impossible. LOCUS-v1 and its derivative models are available on Hugging Face.

Key takeaway

For legal AI scientists and NLP engineers developing models for U.S. law, LOCUS fundamentally changes your access to local ordinance data. You can now utilize a comprehensive, machine-readable corpus covering 9,239 cities and counties, previously unavailable at scale. This enables you to train and evaluate models on critical local regulations, fostering new research into legal dimensions like opacity and paternalism. Leverage the provided ModernBERT-based classifiers to accelerate your analysis and expand the scope of your legal AI applications.

Key insights

Local legal ordinances, critical for daily life, are now machine-readable at scale through LOCUS.

Principles

Method

The LOCUS project collects nearly all publicly available U.S. municipal and county ordinance codes, uses OCR for diverse formats, and trains ModernBERT-based classifiers to analyze legal dimensions like opacity and paternalism.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, Legal Professional

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.