StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

2024-01-30 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

The StylisticBias benchmark evaluates attribute-level social bias in multimodal large language models (MLLMs) by fixing identity and varying single visual attributes. Researchers generated 500 photorealistic base faces using Imagen 4 and created approximately 50 single-attribute variations per face with Nano Banana, resulting in about 25,000 images. Evaluating six MLLMs across 25 binary social judgment scenarios, the study found that age (VS=0.075) and body type (VS=0.069) are the strongest demographic drivers. Fashion style, facial hair, makeup, and eyewear produce the largest attribute-level shifts, with about 15 attributes accounting for nearly 80% of total bias. Sensitivity is highest in socioeconomic and style-related judgments. The benchmark and code are publicly released.

Key takeaway

For AI Ethicists and ML Engineers developing or deploying MLLMs, understanding appearance-driven bias is crucial. This research reveals that MLLMs are highly sensitive to specific visual attributes like fashion and body type, particularly in socioeconomic judgments. You should prioritize auditing for these concentrated biases using fine-grained benchmarks like StylisticBias, especially considering that negative cues elicit stronger shifts. This will help prevent the amplification of societal stereotypes in consequential applications.

Key insights

MLLM social biases are concentrated in a few visual cues, especially self-presentation, and amplified in appearance-aligned judgments.

Principles

Bias is concentrated in ~15 visual attributes.
Negative cues produce larger shifts than positive ones.
Demographic context moderates cue interpretation.

Method

StylisticBias generates 500 base faces, then 50 single-attribute variations per face (25K images) using Imagen 4 and Nano Banana. Six MLLMs are evaluated across 25 binary social judgment scenarios.

In practice

Use StylisticBias to audit MLLMs for appearance-driven bias.
Focus bias mitigation on fashion, facial hair, and makeup cues.
Evaluate negative appearance cues to avoid underestimating bias.

Topics

Multimodal Large Language Models
Social Bias
Visual Attributes
StylisticBias Benchmark
Bias Evaluation
Appearance-Driven Bias

Code references

timo-cavelius/StylisticBias

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Ethicist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.