DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

DisaBench is a new participatory evaluation framework designed to assess disability-related harms in large language models, addressing shortcomings in general-purpose safety benchmarks. Co-created with people with disabilities and red teaming experts, it features a taxonomy of twelve harm categories across five top-level areas, a methodology pairing benign and adversarial prompts across seven life domains, and a dataset of 175 prompts. This dataset includes 525 human-annotated prompt-response pairs from models like Llama 4 Maverick, Grok-3, and Phi-4. Key findings indicate that harm rates vary significantly by disability type (e.g., 37.3% for Vision vs. 17.5% for ND/Learning), terminology-driven harm is culturally and temporally bound, and standard safety evaluations often miss subtle harms that only domain expertise can recognize. The framework and dataset will be openly released via Hugging Face and an open-source red teaming framework.

Key takeaway

For research scientists developing or evaluating large language models, you should integrate community-defined disability harm evaluation into your safety pipelines. Relying solely on general-purpose benchmarks will systematically miss subtle, yet significant, harms like stereotyping or harmful advice. Proactively engage people with disabilities in co-creation and annotation to ensure your models address the full spectrum of potential impacts, especially for non-text modalities where harms may compound.

Key insights

Disability harm evaluation requires co-creation with affected communities and annotators with lived experience to detect subtle, context-dependent failures.

Principles

Method

DisaBench employs a participatory red teaming framework, co-creating a harm taxonomy with disability experts and practitioners, then using structured evaluation with both benign and adversarial prompts across seven life domains, annotated by individuals with lived disability experience.

In practice

Topics

Best for: Research Scientist, AI Scientist, MLOps Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.