Extraction and Search in Rocq: Theorems, Definitions and Their dependencies

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, long

Summary

TheoremExtr is a new Rocq theorem extraction and analysis tool designed to overcome limitations in Rocq's native search capabilities and the challenges of comprehensive data extraction. Current Rocq search commands are restricted to imported modules, preventing cross-project searches, while researchers struggle to obtain detailed theorem information like names, statements, and dependencies without extensive development knowledge. TheoremExtr addresses this by analyzing theorem composition and extracting data from both the parsing phase and runtime. The tool successfully extracted 71,795 theorems and their dependencies, along with 27,481 definitions and their types, from 32 open-source Rocq community projects. A complementary website, lemmasearch.com, provides cross-project similarity search for these extracted artifacts, linking results back to original project sources. TheoremExtr is implemented on Rocq 8.20.0 and available on GitHub.

Key takeaway

For research scientists or software engineers working with Rocq, if you need to efficiently find or extract comprehensive theorem and definition data, consider integrating TheoremExtr into your workflow. This tool overcomes Rocq's native search limitations, providing detailed dependencies and types across projects. You can use the lemmasearch.com website for cross-project similarity searches or deploy TheoremExtr to generate rich Rocq corpora for LLM training or agent development.

Key insights

TheoremExtr combines parser-stage and runtime data to enable comprehensive, cross-project search and extraction of Rocq theorems and definitions.

Principles

Method

TheoremExtr integrates into the Rocq compiler for parser-stage data extraction (statements, scopes, file info, line numbers) and uses a Rocq plugin for runtime extraction (theorem types, inductive definitions). A merging tool combines these datasets into JSON.

In practice

Topics

Code references

Best for: Research Scientist, Software Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.