TLA-Prover: Verifiable TLA+ Specification Synthesis via Preference-Optimized Low-Rank Adaptation

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

TLA-Prover is a 20-billion-parameter model designed for synthesizing verifiable TLA+ specifications, a formal language used for distributed systems and safety-critical protocols. Existing large language models (LLMs) struggle with TLA+ generation, with the best public baseline achieving only 26.6% syntactic parse and 8.6% semantic model-check success. TLA-Prover addresses this by combining supervised fine-tuning (SFT) on verified examples with repair-based group-relative policy optimization (GRPO), where the model learns to correct its own failed specifications using the TLC model checker as a direct reward signal. Evaluated on a held-out 30-problem benchmark, TLA-Prover achieved a 30% pass@1 rate at both Gold and Diamond tiers, representing a 3.5x improvement over the 8.6% untuned baseline. A direct preference optimization (DPO) variant reached 20% at Diamond, with Gold and Diamond tiers ensuring non-trivial property verification.

Key takeaway

For AI Engineers developing formal verification tools or safety-critical system specifications, TLA-Prover demonstrates a viable path to significantly improve LLM-generated TLA+ code. You should consider integrating repair-based policy optimization and direct model checker feedback into your training pipelines. This approach, which yielded a 3.5x improvement in verifiable specifications, offers a robust method to overcome semantic correctness challenges in formal language synthesis, reducing manual verification effort and enhancing system reliability.

Key insights

Combining SFT with repair-based policy optimization significantly improves formal specification synthesis verification.

Principles

Method

TLA-Prover trains with supervised fine-tuning, then uses repair-based group-relative policy optimization (GRPO) where the model fixes its own TLC-rejected specifications, using TLC directly for reward.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.