Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data
Summary
A new research paper introduces Self-Evaluation Elicitation (SEE), a method demonstrating that large language models (LLMs) possess a latent ability to predict how an external judge will score their own open-ended responses. This capability is present in base models even before specific training, performing well above chance with few-shot prompting across three benchmarks. SEE surfaces this ability through a two-phase process: first, a calibration-coupled reinforcement learning phase that both refines the answer and predicts the judge's score, followed by a masked distillation phase that sharpens the prediction without altering the answer quality. Utilizing only about 160 unique examples, which is approximately 31 times fewer than a reinforcement learning baseline, SEE significantly improves held-out calibration across three benchmarks while maintaining the quality of the LLM's answers. The elicited self-evaluation is stable across judges the model was not trained against, suggesting a generalizable understanding of quality.
Key takeaway
For machine learning engineers developing LLM evaluation systems, this research indicates you can achieve robust, judge-aligned self-evaluation by eliciting latent abilities rather than extensive training. Consider implementing Self-Evaluation Elicitation (SEE) to significantly reduce data requirements, potentially by 31x compared to traditional reinforcement learning baselines, while preserving answer quality. This approach offers a more efficient path to integrating reliable self-assessment into your LLM deployments.
Key insights
Base LLMs inherently predict external judge scores, a latent ability efficiently elicited by Self-Evaluation Elicitation (SEE) with minimal data.
Principles
- Self-evaluation is elicitation, not acquisition.
- Latent judge calibration exists in base LLMs.
- Quality notion is transferable across judges.
Method
Self-Evaluation Elicitation (SEE) uses a calibration-coupled reinforcement learning phase to improve answers and predict judges, followed by masked distillation to sharpen predictions while preserving answer quality.
In practice
- Improve LLM evaluation with 31x less data.
- Achieve judge-aligned self-evaluation efficiently.
- Utilize few-shot prompting for latent ability.
Topics
- Large Language Models
- Self-Evaluation
- LLM-as-a-Judge
- Data Efficiency
- Model Calibration
- Reinforcement Learning
Code references
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.