Retrieval-Augmented Generation and Knowledge Graphs in Portuguese-Language Legal Documents

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A Graph Retrieval-Augmented Generation (GraphRAG) pipeline has been developed for Question Answering (QA) in Portuguese legal documents. This system was applied to a corpus of 203 normative resolutions from Companhia Energética de Minas Gerais (CEMIG), addressing the inherent structural complexity of legal texts, including hierarchical dependencies and temporal modifications. The approach models documents as knowledge graphs, where nodes represent structural units like Articles, Paragraphs, and Items, and edges denote normative relationships, thereby preserving context and traceability. Its retrieval mechanism reconstructs evidence paths from root to leaf, followed by semantic re-ranking before answer generation. Evaluated using the RAGAS framework, the system achieved a mean answer accuracy of 0.81 and a median of 1.00. The system demonstrates robust performance on short, focused queries, though intermediate-length questions pose challenges due to semantic dispersion.

Key takeaway

For research scientists developing legal AI systems, this GraphRAG pipeline offers a robust method for handling complex Portuguese legal documents. You should consider implementing knowledge graph representations to explicitly model structural dependencies and temporal modifications, which significantly improves interpretability and precision in QA. Be aware that while short queries perform well, you may need to refine semantic re-ranking for intermediate-length questions to mitigate semantic dispersion.

Key insights

GraphRAG enhances legal QA by modeling document structure as knowledge graphs for context-aware retrieval.

Principles

Method

Documents are modeled as knowledge graphs with structural units as nodes and normative relationships as edges. Retrieval reconstructs evidence paths, followed by semantic re-ranking before generation.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, Legal Professional

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.