Libras-UFPel Corpus: A Parallel Dataset of Brazilian Sign Language and Portuguese for Multimodal Research and Processing

2026-04-12 · Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, quick

Summary

The Libras-UFPel Corpus is a new multimodal, multilayer parallel dataset for Brazilian Sign Language (Libras) and written Portuguese, designed for computational analysis and documentation. It integrates 4,800 controlled audiovisual records, consisting of 2,400 sentences and 2,400 isolated signs, each fully paired with Portuguese translations. Additionally, the corpus includes about 10 hours of spontaneous interaction from three naturalistic interviews, which are currently being edited. To date, 1,200 controlled sentences have been lemmatized, gloss-annotated, and translated, forming a structured parallel subset. This resource supports Libras-to-Portuguese Sign Language Processing tasks like recognition and machine translation, following a hierarchical annotation model that covers lexical, partially lexical, and non-lexical signs, alongside independent tiers for non-manual markers.

Key takeaway

For NLP Engineers and AI Scientists working on accessibility, the Libras-UFPel Corpus offers a critical resource for developing models for Brazilian Sign Language. Your efforts in sign language processing, such as recognition and machine translation, can directly benefit from this structured, multimodal dataset, advancing digital inclusion for the deaf community.

Key insights

The Libras-UFPel Corpus provides a parallel multimodal dataset for Brazilian Sign Language and Portuguese.

Principles

Integrate controlled and naturalistic data.
Ensure interoperability via shared standards.
Employ hierarchical annotation for sign language.

Method

The corpus development involves controlled audiovisual recordings, naturalistic interviews, and systematic annotation including lemmatization, glossing, and translation, with a hierarchical model for sign types and non-manual markers.

In practice

Develop Sign Language Recognition systems.
Build Sign Language Machine Translation models.
Conduct descriptive linguistic analysis of Libras.

Topics

Libras-UFPel Corpus
Brazilian Sign Language
Multimodal Datasets
Sign Language Processing
Natural Language Processing

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.