Development and Evaluation of a Hybrid Information Retrieval System Applied to the Brazilian Legal Domain

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, short

Summary

A hybrid information retrieval system, combining the BM25L algorithm and the BumbaLM language model, has been developed and evaluated for the Brazilian legal domain. This system addresses the limitations of traditional information retrieval systems, which struggle with vocabulary incompatibility and the extensive length of legal texts. While Transformer-based models can capture semantic nuances, they often face input size constraints that lead to information loss when processing long documents. The proposed hybrid approach aims to overcome these challenges, enhancing process management, automating tasks, and reducing the inefficiencies prevalent in judicial systems. This work was presented by Ana Carolina C. Bessa, Fábio M. F. Lobato, and Antonio F. L. J. Junior at the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) in Salvador, Brazil, appearing on pages 186–190 of Volume 2.

Key takeaway

For NLP Engineers working with legal or other long-document domains, consider adopting a hybrid information retrieval strategy. Your systems can mitigate the input size constraints of Transformer models and the vocabulary limitations of traditional methods by combining algorithms like BM25L with domain-specific language models such as BumbaLM. This approach can significantly improve the accuracy and efficiency of legal document processing and judicial task automation.

Key insights

Hybrid IR systems combining traditional and Transformer models can overcome long-text limitations in specialized domains.

Principles

Method

The proposed method combines the BM25L algorithm with the BumbaLM language model to create a hybrid information retrieval system specifically for the Brazilian legal domain.

In practice

Topics

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.