Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification
Summary
Expected Value Alignment (EVA) is a novel reward-modeling procedure designed to enhance Large Language Models (LLMs) in formal mathematics verification, specifically with interactive theorem provers like Lean 4. It addresses a key trade-off in existing process reward models (PRMs): while value-head models offer continuous scores, they alter the generative model interface, and generative models, despite preserving textual rationales, struggle with continuous regression due to numeric values being split across tokens. EVA resolves this by emitting discrete integer scores in a structured JSON format, yet extracts continuous scores by computing the expectation over the logits of corresponding anchor tokens. Its training combines a causal language modeling objective with an auxiliary mean squared error loss on these expected values. Instantiated as Leibniz for Lean 4, EVA's evaluation demonstrates that its continuous logit-based scoring significantly reduces discretization artifacts while maintaining the interpretability of generative critiques.
Key takeaway
For Machine Learning Engineers developing process reward models for formal verification, EVA offers a solution to the long-standing trade-off between continuous scoring and textual interpretability. You should consider implementing EVA's logit-based expectation method to derive continuous feedback from discrete generative models. This approach can significantly reduce discretization artifacts in your reward signals, improving model training efficiency and maintaining the clarity of generative critiques in systems like Lean 4.
Key insights
Expected Value Alignment (EVA) extracts continuous scores from discrete generative reward models by utilizing token logits, improving formal verification.
Principles
- Balance continuous scores with textual rationales.
- Logit expectation yields continuous values from discrete output.
- Reduce discretization artifacts while retaining interpretability.
Method
EVA emits integer scores in structured JSON, then computes a continuous score as the expectation over anchor token logits. Training integrates causal language modeling with an auxiliary mean squared error loss on these expected values.
In practice
- Enhance LLM performance in formal verification.
- Improve reward modeling for interactive theorem provers.
- Apply to systems requiring process reward models.
Topics
- Expected Value Alignment
- Generative Reward Models
- Formal Verification
- Large Language Models
- Lean 4
- Logit-based Scoring
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.