Gendered Stylistic Variation in Brazilian Portuguese Google Play Reviews: A Large-Scale Study
Summary
A large-scale study analyzed gender-associated stylistic variations in 76.7 million Brazilian Portuguese Google Play reviews from 96 apps between 2011 and 2025. Researchers inferred binary gender from first names using IBGE name frequencies, yielding 22.25 million high-confidence labels. The findings indicate that reviews associated with women exhibit approximately 60% higher paralinguistic expressivity, characterized by increased emoji density, lengthening, and punctuation. Conversely, lexical diversity, measured by MTLD, remained nearly identical across gender groups. While overall ratings were predominantly positive, men contributed a relatively higher proportion of 1-star reviews, whereas women submitted more 5-star reviews. This research offers insights into digital sociolinguistic behavior within the Brazilian context.
Key takeaway
For NLP engineers developing sentiment analysis or user profiling models for Brazilian Portuguese, you should account for gender-associated stylistic variations. Specifically, integrate paralinguistic features like emoji density and punctuation into your models, as these show significant differences and could improve the accuracy of sentiment or demographic predictions, especially when analyzing user reviews.
Key insights
Gender-associated stylistic differences exist in Brazilian Portuguese Google Play reviews, particularly in paralinguistic expressivity.
Principles
- Name-based inference can label gender in large datasets.
- Paralinguistic features vary more by gender than lexical diversity.
Method
Binary gender was inferred from first names in 76.7M Google Play reviews using IBGE name frequencies to identify stylistic variations.
In practice
- Analyze paralinguistic features for demographic insights.
- Consider name-based gender inference for large text corpora.
Topics
- Gendered Stylistic Variation
- Brazilian Portuguese
- Google Play Reviews
- Sociolinguistics
- Paralinguistic Expressivity
Best for: NLP Engineer, AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.