Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Expected Value Alignment (EVA) is a novel reward-modeling procedure designed to enhance Large Language Models (LLMs) in formal mathematics verification, specifically with interactive theorem provers like Lean 4. It addresses a key trade-off in existing process reward models (PRMs): while value-head models offer continuous scores, they alter the generative model interface, and generative models, despite preserving textual rationales, struggle with continuous regression due to numeric values being split across tokens. EVA resolves this by emitting discrete integer scores in a structured JSON format, yet extracts continuous scores by computing the expectation over the logits of corresponding anchor tokens. Its training combines a causal language modeling objective with an auxiliary mean squared error loss on these expected values. Instantiated as Leibniz for Lean 4, EVA's evaluation demonstrates that its continuous logit-based scoring significantly reduces discretization artifacts while maintaining the interpretability of generative critiques.

Key takeaway

For Machine Learning Engineers developing process reward models for formal verification, EVA offers a solution to the long-standing trade-off between continuous scoring and textual interpretability. You should consider implementing EVA's logit-based expectation method to derive continuous feedback from discrete generative models. This approach can significantly reduce discretization artifacts in your reward signals, improving model training efficiency and maintaining the clarity of generative critiques in systems like Lean 4.

Key insights

Expected Value Alignment (EVA) extracts continuous scores from discrete generative reward models by utilizing token logits, improving formal verification.

Principles

Method

EVA emits integer scores in structured JSON, then computes a continuous score as the expectation over anchor token logits. Training integrates causal language modeling with an auxiliary mean squared error loss on these expected values.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.