DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models
Summary
DisaBench is a new participatory evaluation framework designed to assess disability-related harms in large language models, addressing shortcomings in general-purpose safety benchmarks. Co-created with people with disabilities and red teaming experts, it features a taxonomy of twelve harm categories across five top-level areas, a methodology pairing benign and adversarial prompts across seven life domains, and a dataset of 175 prompts. This dataset includes 525 human-annotated prompt-response pairs from models like Llama 4 Maverick, Grok-3, and Phi-4. Key findings indicate that harm rates vary significantly by disability type (e.g., 37.3% for Vision vs. 17.5% for ND/Learning), terminology-driven harm is culturally and temporally bound, and standard safety evaluations often miss subtle harms that only domain expertise can recognize. The framework and dataset will be openly released via Hugging Face and an open-source red teaming framework.
Key takeaway
For research scientists developing or evaluating large language models, you should integrate community-defined disability harm evaluation into your safety pipelines. Relying solely on general-purpose benchmarks will systematically miss subtle, yet significant, harms like stereotyping or harmful advice. Proactively engage people with disabilities in co-creation and annotation to ensure your models address the full spectrum of potential impacts, especially for non-text modalities where harms may compound.
Key insights
Disability harm evaluation requires co-creation with affected communities and annotators with lived experience to detect subtle, context-dependent failures.
Principles
- Disability harm is personal, intersectional, and community-defined.
- Standard safety benchmarks miss subtle, context-dependent harms.
- Terminology-driven harm is culturally and temporally bound.
Method
DisaBench employs a participatory red teaming framework, co-creating a harm taxonomy with disability experts and practitioners, then using structured evaluation with both benign and adversarial prompts across seven life domains, annotated by individuals with lived disability experience.
In practice
- Use annotators with lived disability experience for harm detection.
- Include both benign and adversarial prompts in evaluations.
- Ground multilingual evaluations in community-specific norms.
Topics
- DisaBench
- Disability Harm Taxonomy
- Language Model Safety
- Participatory Evaluation
- Red Teaming Framework
Best for: Research Scientist, AI Scientist, MLOps Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.