Libras-UFPel Corpus: A Parallel Dataset of Brazilian Sign Language and Portuguese for Multimodal Research and Processing

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, quick

Summary

The Libras-UFPel Corpus is a new multimodal, multilayer parallel dataset for Brazilian Sign Language (Libras) and written Portuguese, designed for computational analysis and documentation. It integrates 4,800 controlled audiovisual records, consisting of 2,400 sentences and 2,400 isolated signs, each fully paired with Portuguese translations. Additionally, the corpus includes about 10 hours of spontaneous interaction from three naturalistic interviews, which are currently being edited. To date, 1,200 controlled sentences have been lemmatized, gloss-annotated, and translated, forming a structured parallel subset. This resource supports Libras-to-Portuguese Sign Language Processing tasks like recognition and machine translation, following a hierarchical annotation model that covers lexical, partially lexical, and non-lexical signs, alongside independent tiers for non-manual markers.

Key takeaway

For NLP Engineers and AI Scientists working on accessibility, the Libras-UFPel Corpus offers a critical resource for developing models for Brazilian Sign Language. Your efforts in sign language processing, such as recognition and machine translation, can directly benefit from this structured, multimodal dataset, advancing digital inclusion for the deaf community.

Key insights

The Libras-UFPel Corpus provides a parallel multimodal dataset for Brazilian Sign Language and Portuguese.

Principles

Method

The corpus development involves controlled audiovisual recordings, naturalistic interviews, and systematic annotation including lemmatization, glossing, and translation, with a hierarchical model for sign types and non-manual markers.

In practice

Topics

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.