DeFrame: Debiasing Large Language Models Against Framing Effects

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

DeFrame is a novel debiasing framework addressing "framing effects" in large language models (LLMs), where semantically equivalent prompts phrased differently ("A is better than B" vs. "B is worse than A") lead to inconsistent fairness outcomes. The research introduces "framing disparity" to quantify this issue, revealing that existing debiasing methods often fail to mitigate these variations, despite improving overall bias. For instance, on the BBQ benchmark, bias under negative framings was 2x to 4x larger than positive. DeFrame, inspired by dual-process theory, employs a three-stage prompting process—Framing Integration, Guideline Generation, and Self-Revision—to encourage LLMs to consider alternative framings and revise responses. Experiments across 8 LLMs, including LLaMA-3.2-3b-Instruct and Gemma3-12b-Instruct, on benchmarks like BBQ, DoNotAnswer-Framed, and 70Decisions-Framed, demonstrate DeFrame's effectiveness, reducing framing disparity by 92% and overall bias by 93% on average in BBQ.

Key takeaway

For AI Scientists and Machine Learning Engineers deploying LLMs in sensitive applications, you must recognize that standard fairness evaluations are insufficient. Your models can exhibit significant "framing disparity," where bias levels shift dramatically based on prompt wording. To ensure robust and consistent fairness, you should integrate framing-aware debiasing techniques like DeFrame into your development and evaluation workflows, preventing hidden biases from leading to discriminatory outcomes.

Key insights

LLM fairness evaluations are highly sensitive to prompt framing, necessitating methods that ensure consistency across phrasing.

Principles

LLM fairness metrics vary significantly with prompt framing.
Current debiasing methods often fail to reduce framing-induced disparities.
Dual-process theory can guide robust debiasing strategies.

Method

DeFrame integrates alternative framings, generates fairness guidelines, and then self-revises initial LLM responses to ensure consistency and reduce bias.

In practice

Quantify "framing disparity" by testing positive and negative prompt framings.
Use multi-stage prompting to improve LLM consistency across framings.
Consider dual-process theory for designing debiasing workflows.

Topics

Large Language Models
Bias Mitigation
Fairness Evaluation
Framing Effect
Prompt Engineering
DeFrame Framework

Code references

Libr-AI/do-not-answer

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.