Bootstrapping Sign Language Annotations with Sign Language Models
Summary
Apple researchers, in collaboration with Gallaudet University, have developed a pseudo-annotation pipeline to address the scarcity of high-quality annotated sign language data. This pipeline takes signed video and English text as input, generating ranked annotations for glosses, fingerspelled words, and sign classifiers with time intervals. The method leverages sparse predictions from a fingerspelling recognizer and an isolated sign recognizer (ISR), combined with a K-Shot LLM approach. As part of this effort, the team established baseline fingerspelling and ISR models, achieving a 6.7% Character Error Rate (CER) on FSBoard and 74% top-1 accuracy on ASL Citizen datasets. To validate the pipeline, a professional interpreter manually annotated nearly 500 videos from the ASL STEM Wiki, creating a gold-standard benchmark. Over 300 hours of pseudo-annotations and the human annotations are being released.
Key takeaway
For NLP Engineers working on sign language interpretation, this research offers a viable path to overcome data scarcity. You should explore integrating pseudo-annotation pipelines, especially for large, partially annotated datasets like ASL STEM Wiki. Consider adopting the proposed baseline fingerspelling and ISR models, which achieved 6.7% CER on FSBoard and 74% top-1 accuracy on ASL Citizen, to accelerate your model development and annotation efforts.
Key insights
A pseudo-annotation pipeline uses sparse recognizer predictions and LLMs to generate sign language annotations.
Principles
- Sparse predictions can bootstrap dense annotations.
- Combining specialized models with LLMs enhances annotation.
- High-quality human annotation is crucial for validation.
Method
The pipeline integrates sparse predictions from fingerspelling and isolated sign recognizers with a K-Shot LLM approach to estimate and rank sign language annotations, including time intervals, for video input.
In practice
- Utilize existing unannotated sign language datasets.
- Employ K-Shot LLMs for annotation ranking.
- Benchmark fingerspelling and ISR models.
Topics
- Sign Language Annotation
- Pseudo-annotation Pipeline
- Fingerspelling Recognition
- Isolated Sign Recognition
- Large Language Models
Best for: NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.