Investigating Counterfactual Unfairness in LLMs towards Identities through Humor
Summary
A new study investigates counterfactual unfairness in large language models (LLMs) by analyzing their responses to humor, specifically when speaker and addressee identities are swapped. The research framework covers three tasks: humor generation refusal, speaker intention inference, and relational/societal impact prediction, encompassing both identity-agnostic and identity-specific disparagement humor. By introducing interpretable bias metrics, the study reveals consistent relational disparities in "state-of-the-art" models. Experiments show that jokes from privileged speakers are refused up to 67.5% more often, judged as malicious 64.7% more frequently, and rated up to 1.5 points higher in social harm on a 5-point scale. These findings indicate that LLMs exhibit coexisting sensitivity and stereotyping, complicating fairness and cultural alignment efforts.
Key takeaway
For research scientists developing or deploying LLMs, understanding how models internalize social assumptions is critical. Your fairness evaluations should extend beyond basic content moderation to include nuanced counterfactual scenarios, such as identity swaps in humor. This will help uncover subtle biases and improve cultural alignment, ensuring models do not inadvertently perpetuate harmful stereotypes.
Key insights
LLMs exhibit counterfactual unfairness in humor, showing bias based on speaker/addressee identity swaps.
Principles
- Humor reflects social perception and internalized assumptions.
- Sensitivity and stereotyping can coexist in generative models.
Method
The study uses a framework with three tasks: humor generation refusal, speaker intention inference, and relational/societal impact prediction, applying interpretable bias metrics under identity swaps.
In practice
- Evaluate LLM fairness beyond simple content filtering.
- Test model responses to identity-swapped humor scenarios.
Topics
- Large Language Models
- Counterfactual Unfairness
- Humor Analysis
- Identity Bias
- Generative AI Fairness
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, NLP Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.