Gender Identification in Brazilian Portuguese Product Reviews: A Comparative Study of Classical Models, BERT, and LLMs

2026-04-12 · Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Advanced, quick

Summary

A study analyzed gender identification in Brazilian Portuguese using Amazon reviews from ten product categories. Researchers evaluated nine models, including three classical classifiers (Logistic Regression, Random Forest, SVM), a multilingual BERT, and five LLMs (ChatGPT 4o, ChatGPT 3.5, DeepSeek, Sabia3, Sabiazinho). The multilingual BERT model achieved the highest performance with a macro-F1 score of 0.634, narrowly outperforming ChatGPT 4o and Logistic Regression by less than one percentage point. The analysis revealed that reviews written by women achieved an average F1 score of 0.654, which is four points higher than those by men. Performance also varied significantly across product domains, with books and automotive categories proving easier for identification, while baby and pets categories were more challenging.

Key takeaway

For research scientists developing gender identification models for Brazilian Portuguese, you should prioritize multilingual BERT, as it demonstrated superior performance (macro-F1 = 0.634) over several LLMs. Be aware that model accuracy can differ significantly based on the product category and the author's gender, requiring domain-specific tuning or data augmentation for optimal results in challenging categories like baby and pets.

Key insights

Gender identification in Brazilian Portuguese Amazon reviews shows BERT outperforming LLMs, with performance varying by author gender and product domain.

Principles

Model performance varies by domain.
Gender identification varies by author gender.

Method

Evaluated nine models (classical, BERT, LLMs) on Amazon reviews for gender identification in Brazilian Portuguese, measuring macro-F1 scores.

In practice

Consider BERT for gender identification.
Account for domain-specific performance.

Topics

Gender Identification
Brazilian Portuguese
Product Reviews
Classical Classifiers
BERT

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.