Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship
Summary
A study investigated whether large language models (LLMs) exhibit self-preference bias when revising their own text based on verifiable instructions. Across four mid-tier model families and 85 comparisons, researchers found no detectable self-preference. Models acting as genuine in-context authors rejected verified-good fixes to their own drafts at essentially the same rate as fresh models judging the same drafts, showing a gap of -5.1 percentage points (95% CI [-12.9, +2.7]). This contradicts a smaller pilot's hint of self-skepticism. The research utilized the IFEval checker to deterministically confirm constraint violations and valid edits. Qualitatively, 97% of author rejections for verified-good fixes were attributed to flaw-catching rather than preference. Effects smaller than approximately 13 percentage points could not be excluded at this sample size.
Key takeaway
For NLP Engineers developing LLM-powered revision tools, you can be confident that models do not inherently resist valid corrections to their own generated text. This suggests that integrating objective, verifiable instruction-following checks, like those used with IFEval, is a robust strategy for improving output quality without significant self-preference bias. Focus on clear, verifiable instructions rather than complex bias mitigation for self-revision tasks.
Key insights
LLMs show no detectable self-preference when revising their own text based on verifiable instruction-following.
Principles
- LLM self-preference is weak or absent in verifiable revision.
- Rejections of valid fixes are primarily flaw-catching.
- Deterministic verifiers can assess revision quality.
Method
Models draft text, an IFEval checker verifies constraint violations and valid fixes, then the model accepts or rejects the fix as either the author or a neutral judge.
In practice
- Trust LLMs to revise their own outputs objectively.
- Focus on objective verification for LLM revisions.
Topics
- Large Language Models
- LLM Revision
- Instruction Following
- Self-Preference Bias
- IFEval
- Text Generation
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.