Cross-Model Disagreement as a Label-Free Correctness Signal

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

Cross-model disagreement is introduced as a novel, training-free, label-free correctness indicator for language model outputs, specifically addressing "confident errors" where models are wrong but certain. This method involves a "verifier" model performing a single forward pass over a "generator" model's answer to compute its surprise (Cross-Model Perplexity, CMP) or uncertainty (Cross-Model Entropy, CME). Unlike existing approaches relying on a model's own uncertainty, CMP and CME do not require verifier generation or correctness labels. Benchmarking across MMLU, TriviaQA, and GSM8K, CMP achieved a mean AUROC of 0.75 on MMLU, significantly outperforming within-model entropy baselines (0.59). The approach is applicable to deployment monitoring, model routing, selective prediction, and data filtering. Its effectiveness on knowledge-intensive tasks like MMLU is driven by architectural diversity between models, not necessarily capability asymmetry, while open-ended retrieval tasks benefit from a stronger verifier.

Key takeaway

For MLOps Engineers deploying LLMs in high-stakes environments, integrating cross-model disagreement signals like CMP or CME offers a robust, label-free method to detect confident errors. You can use this to flag likely incorrect outputs for review, route complex queries to stronger models only when needed, or enable selective prediction to improve accuracy by abstaining on uncertain inputs. This approach enhances system reliability and optimizes inference costs without requiring extensive labeled data or router training.

Key insights

Cross-model disagreement detects confident LLM errors by measuring a second model's surprise at the first's answer.

Principles

Method

Given a generator's answer, a verifier performs a single forward pass on the prompt and answer. Cross-Model Perplexity (CMP) aggregates token-level surprise, while Cross-Model Entropy (CME) aggregates token-level uncertainty. No generation or labels are needed.

In practice

Topics

Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.