RBCorr: Response Bias Correction in Language Models

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

RBCorr is a novel, low-cost response bias correction strategy for language models (LMs) that addresses option preference biases in fixed-response questions. Researchers tested RBCorr on 12 open-weight LMs from the Falcon3, Gemma3, and Llama3.1 families, across yes-no, entailment, and multiple-choice questions using datasets like ARITH, bAbI, SNLI, MultiNLI, and MMLU. The study found that LMs exhibit significant response bias, which RBCorr effectively eliminates while boosting or maintaining model performance, particularly for smaller LMs. The method involves mean-normalizing LogProbs values using a small, class-balanced calibration set (typically 100 questions). Comparisons with existing methods like Contextual Calibration (CC) and Batch Calibration (BC) showed RBCorr often yields superior bias reduction and competitive accuracy gains, achieving up to 29% recovered accuracy on the BABI dataset. However, the study also revealed that LogProbs-based correction terms are highly context-specific and do not reliably transfer across different models, datasets, or prompt formats.

Key takeaway

For NLP Engineers and AI Scientists evaluating or deploying open-source language models, implementing RBCorr can significantly improve the accuracy and fairness of fixed-response evaluations, especially for smaller models. Your team should integrate this calibration-based method to uncover latent model performance and ensure benchmarks reflect true capabilities, rather than inherent response biases. Be aware that correction terms are not transferable, requiring specific calibration for each model, dataset, and prompt configuration.

Key insights

RBCorr effectively mitigates language model response bias and improves performance using a simple, low-cost LogProbs-based calibration method.

Principles

Method

RBCorr applies mean-normalization to LogProbs values by estimating mean LogProbs for each response option from a small, class-balanced calibration set, then subtracting these means from evaluation set LogProbs.

In practice

Topics

Code references

Best for: NLP Engineer, AI Scientist, Research Scientist, AI Researcher, AI Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.