DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

2025-05-04 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

DisaBench is a new participatory evaluation framework designed to assess disability-related harms in large language models, addressing shortcomings in general-purpose safety benchmarks. Co-created with people with disabilities and red teaming experts, it features a taxonomy of twelve harm categories across five top-level areas, a methodology pairing benign and adversarial prompts across seven life domains, and a dataset of 175 prompts. This dataset includes 525 human-annotated prompt-response pairs from models like Llama 4 Maverick, Grok-3, and Phi-4. Key findings indicate that harm rates vary significantly by disability type (e.g., 37.3% for Vision vs. 17.5% for ND/Learning), terminology-driven harm is culturally and temporally bound, and standard safety evaluations often miss subtle harms that only domain expertise can recognize. The framework and dataset will be openly released via Hugging Face and an open-source red teaming framework.

Key takeaway

For research scientists developing or evaluating large language models, you should integrate community-defined disability harm evaluation into your safety pipelines. Relying solely on general-purpose benchmarks will systematically miss subtle, yet significant, harms like stereotyping or harmful advice. Proactively engage people with disabilities in co-creation and annotation to ensure your models address the full spectrum of potential impacts, especially for non-text modalities where harms may compound.

Key insights

Disability harm evaluation requires co-creation with affected communities and annotators with lived experience to detect subtle, context-dependent failures.

Principles

Disability harm is personal, intersectional, and community-defined.
Standard safety benchmarks miss subtle, context-dependent harms.
Terminology-driven harm is culturally and temporally bound.

Method

DisaBench employs a participatory red teaming framework, co-creating a harm taxonomy with disability experts and practitioners, then using structured evaluation with both benign and adversarial prompts across seven life domains, annotated by individuals with lived disability experience.

In practice

Use annotators with lived disability experience for harm detection.
Include both benign and adversarial prompts in evaluations.
Ground multilingual evaluations in community-specific norms.

Topics

DisaBench
Disability Harm Taxonomy
Language Model Safety
Participatory Evaluation
Red Teaming Framework

Best for: Research Scientist, AI Scientist, MLOps Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.