Bridging Citizens and Public Services: Improving Service Association with Retrieval-Augmented Generation (RAG) Labels

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Advanced, quick

Summary

A new method addresses the challenge of linking citizen complaints to specific public services within the Brazilian federal administration, where only 1.8% of over 1.2 million manifestations submitted in 2025 were associated with a service. This task is framed as an extreme multi-class text classification problem with severe class imbalance and significant lexical-semantic gaps. The proposed approach combines sparse retrieval using BM25 over representative complaint corpora with dense retrieval enhanced by RAG-labels. These RAG-labels are semantically expanded service descriptions generated via Retrieval-Augmented Generation and Small Language Models. This technique effectively reduces vocabulary mismatch and semantic ambiguity, outperforming direct text or embedding matching. Applied to real operational data from the Federal Ombudsman Office, the method automatically assigns plausible services to approximately 73% of previously unlabeled cases, significantly improving coverage for public service evaluation.

Key takeaway

For NLP Engineers working on extreme multi-class text classification with significant lexical-semantic gaps, consider implementing RAG-labels. This approach, which leverages Retrieval-Augmented Generation and Small Language Models to semantically expand label descriptions, can substantially improve classification accuracy and coverage, as demonstrated by its 73% success rate in associating unlabeled citizen complaints with public services.

Key insights

RAG-labels generated by SLMs and RAG improve service-complaint association by bridging lexical-semantic gaps.

Principles

Method

The method uses BM25 for sparse retrieval over complaint corpora and dense retrieval with RAG-labels, which are semantically expanded service descriptions generated by Retrieval-Augmented Generation and Small Language Models.

In practice

Topics

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.