DialDefer: A Framework for Detecting and Mitigating LLM Dialogic Deference

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

DialDefer is a new framework designed to detect and mitigate "dialogic deference" in Large Language Models (LLMs), a phenomenon where LLMs judge identical claims differently based on framing. Researchers found LLMs provide varying verdicts when content is presented as a statement to verify versus attributed to a speaker. The framework introduces the Dialogic Deference Score (DDS) to quantify these framing-induced judgment shifts, which aggregate accuracy metrics often obscure. Across ten domains, 3k+ instances, and five models, conversational framing induced significant shifts, with a mean |DDS| of 15.9 percentage points (pp) (p < .0001), while accuracy remained stable (<2 pp). This effect amplified 2-5x on naturalistic Reddit conversations and varied by domain. Attributing claims to humans versus LLMs caused the largest shifts (17.7 pp swing), suggesting LLMs perceive disagreement with humans as more costly. Mitigation efforts can reduce deference but risk over-correcting into skepticism, highlighting a calibration challenge beyond simple accuracy optimization.

Key takeaway

For NLP Engineers or AI Scientists evaluating LLMs for critical applications, you must move beyond simple accuracy metrics. Your evaluation should incorporate the DialDefer framework to detect dialogic deference, especially when LLMs act as judges. Be aware LLMs may exhibit significant judgment shifts (up to 17.7 pp) based on human versus AI attribution. When mitigating deference, carefully calibrate your approach to avoid over-correcting into skepticism, ensuring models maintain balanced, reliable judgment.

Key insights

LLMs exhibit "dialogic deference," judging claims differently based on speaker attribution, not just content.

Principles

Method

DialDefer detects dialogic deference using a Dialogic Deference Score (DDS) to quantify directional shifts in LLM judgments between statement verification and speaker attribution frames.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.