Identification of fake news in Portuguese: a look at the generalization of models

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A study investigated the generalization capabilities of BERTimbau and mBERT language models for fake news detection in Portuguese, specifically in cross-generalization scenarios where test data differed from training and validation data. Researchers fine-tuned these models using four Brazilian Portuguese corpora: Fake.br, Fakepedia, FakeRecogna, and FakeTrueBR. The findings confirmed that intra-base evaluations yielded high performance, while inter-base evaluations showed significant degradation in cross-generalization, despite the consistent objective of identifying fake news. Quantitatively, BERTimbau slightly outperformed mBERT, achieving an average accuracy of 71% and an f1-score of 67%, compared to mBERT's 69% accuracy and 64% f1-score.

Key takeaway

For research scientists developing fake news detection systems, you should prioritize rigorous cross-generalization testing beyond intra-base evaluations. The observed performance degradation in inter-base scenarios highlights the critical need for training data diversity and robust validation against real-world, varied datasets to ensure practical efficacy and avoid deploying models with limited real-world applicability.

Key insights

Language models for fake news detection show significant performance degradation in cross-generalization scenarios.

Principles

Method

Fine-tuning BERTimbau and mBERT on four Brazilian Portuguese corpora (Fake.br, Fakepedia, FakeRecogna, FakeTrueBR) to assess cross-generalization.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.