Peer-Predictive Self-Training for Language Model Reasoning

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

Peer-Predictive Self-Training (PST) is a novel, label-free fine-tuning framework enabling language models to improve collaboratively without external supervision. Multiple models generate responses sequentially, with the final aggregated answer serving as a more reliable internal training signal. PST quantifies the informativeness of each intermediate response about the aggregate using pointwise mutual information (PMI), scaling self-training updates accordingly: responses aligned with the aggregate receive smaller updates, while misaligned ones receive larger updates. Evaluated on mathematical reasoning benchmarks like SimulEq, Math500, and MultiArith, PST improved exact-match accuracy by 2.2–4.3 percentage points across Gemma-2-2B, LLaMA-3.2-1B, and Qwen-2.5-1.5B. It also reduced the average generator–verifier gap (GV-Gap) by 26–40%, demonstrating effective self-supervised training through cross-model interactions.

Key takeaway

For research scientists developing self-improving language models, PST offers a robust, unsupervised fine-tuning approach. By leveraging cross-model aggregation and PMI-weighted updates, you can enhance reasoning capabilities and reduce generator–verifier gaps without relying on expensive labeled data or explicit reward models. Consider integrating PST into your training pipelines, especially for tasks where ground truth is scarce, to achieve consistent performance gains across diverse model architectures.

Key insights

Aggregating peer predictions and weighting self-training by mutual information enables unsupervised language model improvement.

Principles

Method

PST involves sequential generation by multiple models, using the final aggregated response as a reference. Pointwise mutual information (PMI) between intermediate and final responses scales cross-entropy loss updates.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.