Bridging Citizens and Public Services: Improving Service Association with Retrieval-Augmented Generation (RAG) Labels
Summary
A new method addresses the challenge of linking citizen complaints to specific public services within the Brazilian federal administration, where only 1.8% of over 1.2 million manifestations submitted in 2025 were associated with a service. This task is framed as an extreme multi-class text classification problem with severe class imbalance and significant lexical-semantic gaps. The proposed approach combines sparse retrieval using BM25 over representative complaint corpora with dense retrieval enhanced by RAG-labels. These RAG-labels are semantically expanded service descriptions generated via Retrieval-Augmented Generation and Small Language Models. This technique effectively reduces vocabulary mismatch and semantic ambiguity, outperforming direct text or embedding matching. Applied to real operational data from the Federal Ombudsman Office, the method automatically assigns plausible services to approximately 73% of previously unlabeled cases, significantly improving coverage for public service evaluation.
Key takeaway
For NLP Engineers working on extreme multi-class text classification with significant lexical-semantic gaps, consider implementing RAG-labels. This approach, which leverages Retrieval-Augmented Generation and Small Language Models to semantically expand label descriptions, can substantially improve classification accuracy and coverage, as demonstrated by its 73% success rate in associating unlabeled citizen complaints with public services.
Key insights
RAG-labels generated by SLMs and RAG improve service-complaint association by bridging lexical-semantic gaps.
Principles
- Extreme multi-class classification benefits from semantic expansion.
- Combining sparse and dense retrieval enhances matching accuracy.
Method
The method uses BM25 for sparse retrieval over complaint corpora and dense retrieval with RAG-labels, which are semantically expanded service descriptions generated by Retrieval-Augmented Generation and Small Language Models.
In practice
- Apply RAG-labels to bridge vocabulary mismatches.
- Use hybrid retrieval for imbalanced text classification.
Topics
- Retrieval-Augmented Generation
- Public Service Association
- Citizen Complaint Management
- Extreme Multi-class Classification
- Portuguese NLP
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.