BAFIS: Dataset + Framework to assess occupational Bias and Human Preference in modern Text-to-image Models

2026-06-18 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, AI Ethics & Fairness · Depth: Expert, quick

Summary

BAFIS, a new dataset and framework, assesses occupational bias and human preference in modern text-to-image models. This work investigates inherent and language-induced biases in occupation-related image generation, complementing established metrics with human feedback. A comprehensive evaluation was conducted on five current models: Midjourney v6.1, Stable Diffusion 3 Medium, DALL-E 3, Playground v2.5, and FLUX.1-dev. The assessment focused on gender and ethnicity bias, image quality, and prompt alignment. To facilitate this, the "Battle-Arena for Fair Image Synthesis" (BAFIS) platform was developed for collecting human feedback. Researchers also created a dataset of 21,140 synthetic images using multilingual prompts. Findings reveal systematic biases in these models, with established evaluation metrics showing only partial correlation with subjective user ratings, underscoring the importance of human preferences for developing fairer models.

Key takeaway

For AI Ethicists and Research Scientists developing or deploying text-to-image models, you must integrate human preference feedback into your bias evaluation frameworks. Relying solely on established metrics is insufficient; they only partially correlate with subjective user ratings on occupational biases. Incorporate platforms like BAFIS to gather direct human input. This ensures your models are genuinely fairer and more inclusive for diverse applications.

Key insights

Text-to-image models exhibit systematic occupational biases, requiring human preference feedback for fair evaluation and development.

Principles

Image generation models are significantly influenced by biases.
Human preferences are crucial for fairer model development.
Established metrics partially correlate with subjective user ratings.

Method

Developed BAFIS platform for human feedback on bias. Created a 21,140-image dataset from multilingual prompts. Evaluated five models for gender/ethnicity bias, quality, and prompt alignment, comparing results to official statistics.

In practice

Use BAFIS to collect human feedback on image bias.
Integrate human preference data into bias assessments.
Compare model biases against real-world demographic statistics.

Topics

Text-to-Image Models
AI Bias
Occupational Bias
Human Preference
Fairness Metrics
Dataset Evaluation
BAFIS Framework

Best for: AI Scientist, AI Ethicist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.