Sign-Language Datasets at Scale: A Comprehensive Survey on Resources, Benchmarks, and Annotation Standards

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A comprehensive survey indexes 120 sign-language datasets across 35 sign languages, addressing critical challenges in sign-language recognition, translation, and production. The analysis reveals significant fragmentation, inconsistent annotations, and limited linguistic coverage across existing resources, which constrain advances in automated sign language technologies. Key issues identified include modality imbalance, annotation granularity, and signer bias. To foster standardization and reproducibility, the survey introduces a 24-field "Sign-Language Datasheet" and provides a public GitHub repository with consolidated benchmark results. This work aims to establish a unified foundation for developing inclusive, robust, and scalable sign-language AI applications.

Key takeaway

For AI scientists and ML engineers developing sign language technologies, you should prioritize dataset curation that addresses current fragmentation and bias. Focus on incorporating diverse linguistic and demographic coverage, standardizing annotation practices, and ensuring long-term data accessibility. This approach will improve model generalizability and reduce performance disparities across different sign languages and user communities, fostering more inclusive and robust AI systems.

Key insights

Fragmented sign language datasets and inconsistent annotations hinder robust, scalable AI development.

Principles

Method

The survey proposes a 24-field "Sign-Language Datasheet" for structured documentation, covering properties like modality, demographics, and vocabulary scale to standardize reporting.

In practice

Topics

Code references

Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.