ConsumerBR: A Large-Scale Corpus of Consumer Complaints in Brazilian Portuguese

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cybersecurity & Data Privacy · Depth: Expert, medium

Summary

ConsumerBR is a new, large-scale corpus of over 3.1 million consumer complaints and company responses in Brazilian Portuguese, collected from the Consumidor.gov.br platform between 2021 and 2025. This publicly available dataset combines anonymized textual content with structured metadata, including temporal information, complaint outcomes, and consumer satisfaction indicators. The creators developed a specialized data collection strategy for the platform's dynamic interface, a preprocessing pipeline that clusters responses to identify template-based replies, and a hybrid anonymization method to protect privacy. The corpus is characterized by its significant scale, broad coverage, and specific distributional properties, supporting various research applications such as complaint analysis, sentiment modeling, dialogue generation, and preference-based evaluation.

Key takeaway

For research scientists working with natural language processing in Portuguese, ConsumerBR provides an invaluable resource for developing and evaluating models. You should consider integrating this corpus into your projects focused on complaint analysis, sentiment modeling, or dialogue generation, as its scale and rich metadata offer unique opportunities for robust model training and evaluation.

Key insights

ConsumerBR offers a large, anonymized corpus of Brazilian Portuguese consumer complaints for NLP research.

Principles

Method

The method involves a tailored data collection strategy, a preprocessing pipeline with response clustering for template identification, and a hybrid anonymization approach to mitigate privacy risks.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.