BAFIS: Dataset + Framework to assess occupational Bias and Human Preference in modern Text-to-image Models
Summary
BAFIS, a new dataset and framework, assesses occupational bias and human preference in modern text-to-image models. This work investigates inherent and language-induced biases in occupation-related image generation, complementing established metrics with human feedback. A comprehensive evaluation was conducted on five current models: Midjourney v6.1, Stable Diffusion 3 Medium, DALL-E 3, Playground v2.5, and FLUX.1-dev. The assessment focused on gender and ethnicity bias, image quality, and prompt alignment. To facilitate this, the "Battle-Arena for Fair Image Synthesis" (BAFIS) platform was developed for collecting human feedback. Researchers also created a dataset of 21,140 synthetic images using multilingual prompts. Findings reveal systematic biases in these models, with established evaluation metrics showing only partial correlation with subjective user ratings, underscoring the importance of human preferences for developing fairer models.
Key takeaway
For AI Ethicists and Research Scientists developing or deploying text-to-image models, you must integrate human preference feedback into your bias evaluation frameworks. Relying solely on established metrics is insufficient; they only partially correlate with subjective user ratings on occupational biases. Incorporate platforms like BAFIS to gather direct human input. This ensures your models are genuinely fairer and more inclusive for diverse applications.
Key insights
Text-to-image models exhibit systematic occupational biases, requiring human preference feedback for fair evaluation and development.
Principles
- Image generation models are significantly influenced by biases.
- Human preferences are crucial for fairer model development.
- Established metrics partially correlate with subjective user ratings.
Method
Developed BAFIS platform for human feedback on bias. Created a 21,140-image dataset from multilingual prompts. Evaluated five models for gender/ethnicity bias, quality, and prompt alignment, comparing results to official statistics.
In practice
- Use BAFIS to collect human feedback on image bias.
- Integrate human preference data into bias assessments.
- Compare model biases against real-world demographic statistics.
Topics
- Text-to-Image Models
- AI Bias
- Occupational Bias
- Human Preference
- Fairness Metrics
- Dataset Evaluation
- BAFIS Framework
Best for: AI Scientist, AI Ethicist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.