Rating–Text Mismatch in Brazilian Portuguese Reviews: How Reliable Are Zero-Shot LLMs?

2026-04-12 · Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A study evaluated the capability of large language models (LLMs) to identify inconsistencies between product review text and their corresponding 1-star or 5-star ratings in Brazilian Portuguese. Researchers utilized popular LLMs like GPT-5, Llama-4, and DeepSeek-3.2, alongside models optimized for Brazilian Portuguese, Sabiá-3.1 and Bode-3.1. The findings indicate that some LLMs achieved high performance, with F1 scores exceeding 90% in a zero-shot protocol for detecting these rating-text mismatches. Furthermore, the models demonstrated strong agreement in their predictions, showing low variability across multiple rounds (Fleiss' κ > 0.95). Approximately 10% of comments across all product categories exhibited this incoherence, suggesting LLMs are highly promising for complex semantic interpretation tasks and valuable for online monitoring and recommendation systems.

Key takeaway

For AI Engineers developing content moderation or recommendation systems, these findings suggest that zero-shot LLMs can reliably identify rating-text inconsistencies in Brazilian Portuguese. You should consider integrating models like GPT-5 or Sabiá-3.1 to automatically flag potentially misleading reviews, improving data quality and user trust. This capability can enhance the accuracy of sentiment analysis and product insights.

Key insights

LLMs effectively detect rating-text incoherence in Brazilian Portuguese reviews with high F1 scores and strong inter-model agreement.

Principles

Zero-shot LLMs can achieve high semantic interpretation.
Rating-text incoherence is prevalent across product categories.

Method

The study evaluated LLMs (GPT-5, Llama-4, DeepSeek-3.2, Sabiá-3.1, Bode-3.1) for detecting 1-star or 5-star rating-text incoherence in Brazilian Portuguese reviews using a zero-shot protocol.

In practice

Use LLMs for online content monitoring.
Integrate LLMs into recommendation systems.

Topics

Rating-Text Mismatch
Zero-Shot LLMs
Brazilian Portuguese
Product Reviews
Semantic Interpretation

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.