Extraction and Search in Rocq: Theorems, Definitions and Their dependencies
Summary
TheoremExtr is a new Rocq theorem extraction and analysis tool designed to overcome limitations in Rocq's native search capabilities and the challenges of comprehensive data extraction. Current Rocq search commands are restricted to imported modules, preventing cross-project searches, while researchers struggle to obtain detailed theorem information like names, statements, and dependencies without extensive development knowledge. TheoremExtr addresses this by analyzing theorem composition and extracting data from both the parsing phase and runtime. The tool successfully extracted 71,795 theorems and their dependencies, along with 27,481 definitions and their types, from 32 open-source Rocq community projects. A complementary website, lemmasearch.com, provides cross-project similarity search for these extracted artifacts, linking results back to original project sources. TheoremExtr is implemented on Rocq 8.20.0 and available on GitHub.
Key takeaway
For research scientists or software engineers working with Rocq, if you need to efficiently find or extract comprehensive theorem and definition data, consider integrating TheoremExtr into your workflow. This tool overcomes Rocq's native search limitations, providing detailed dependencies and types across projects. You can use the lemmasearch.com website for cross-project similarity searches or deploy TheoremExtr to generate rich Rocq corpora for LLM training or agent development.
Key insights
TheoremExtr combines parser-stage and runtime data to enable comprehensive, cross-project search and extraction of Rocq theorems and definitions.
Principles
- Rocq's native search is limited to the current environment.
- Comprehensive theorem extraction needs both parse-time and runtime data.
- Cross-project search requires a unified repository.
Method
TheoremExtr integrates into the Rocq compiler for parser-stage data extraction (statements, scopes, file info, line numbers) and uses a Rocq plugin for runtime extraction (theorem types, inductive definitions). A merging tool combines these datasets into JSON.
In practice
- Use TheoremExtr for detailed Rocq theorem info.
- Search lemmasearch.com for cross-project theorems.
- Extract Rocq data with a simple Python script.
Topics
- Rocq
- Theorem Provers
- Data Extraction
- Formal Methods
- Similarity Search
- LLM Training Data
Code references
Best for: Research Scientist, Software Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.