Peer-Predictive Self-Training for Language Model Reasoning
Summary
Peer-Predictive Self-Training (PST) is a novel, label-free fine-tuning framework designed for language models to achieve self-improvement without external supervision. PST enables multiple language models to collaborate, generating responses sequentially to a given prompt. A cross-model aggregated response, which is often more reliable than individual outputs, serves as an internal training signal. The framework quantifies the informativeness of each intermediate response relative to the aggregate using pointwise mutual information (PMI), scaling self-training updates accordingly: responses aligned with the aggregate receive fewer updates, while misaligned ones receive more. PST improved exact-match accuracy by 2.2 to 4.3 percentage points on mathematical reasoning benchmarks like SimulEq, Math500, and MultiArith across Gemma-2-2B, LLaMA-3.2-1B, and Qwen-2.5-1.5B, and reduced the average generator-verifier gap (GV-Gap) by 26 to 40 percent.
Key takeaway
For AI engineers developing reasoning capabilities in language models, PST offers a method to enhance model accuracy and reduce the generator-verifier gap without relying on external labels or complex teacher-student hierarchies. You should consider implementing PST to enable collaborative self-improvement among your models, particularly for tasks requiring robust mathematical or logical reasoning. This approach can streamline the fine-tuning process and improve model reliability.
Key insights
Language models can self-improve collaboratively by using cross-model aggregated responses as internal training signals.
Principles
- Cross-model aggregation enhances response reliability.
- PMI quantifies response informativeness for scaled updates.
Method
PST involves sequential response generation by multiple LMs, aggregation of responses, and using the aggregate as a training target. PMI scales updates based on individual response alignment.
In practice
- Apply PST for label-free LM fine-tuning.
- Utilize cross-model interactions for self-supervised training.
Topics
- Peer-Predictive Self-Training
- Language Model Reasoning
- Self-Supervised Learning
- Cross-Model Aggregation
- Pointwise Mutual Information
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.