The F1 of Formula One: Applicability of Pre-trained NER Models to Brazilian TV Interview Transcripts
Summary
A study presented at the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) compared two named entity recognition (NER) methods for analyzing Brazilian TV interview transcripts. Researchers from the Roda Viva program, a long-running Brazilian interview show, evaluated a statistical-neural method and large language models (LLMs) against manual annotations of six interviews with Brazilian Formula One drivers. The statistical method demonstrated rigid dependence on capitalization and lexical familiarity, resulting in mechanical false positives and missed non-capitalized entities. In contrast, the LLM exhibited greater linguistic sensitivity, effectively retrieving contextual entities and showing robustness to transcription errors, despite also producing false positives. The LLM-based approach is considered more promising due to its flexibility and potential for refinement through instructional filtering to resolve ambiguities, which could automate social network extraction from the corpus.
Key takeaway
For NLP Engineers working with transcribed spoken language, especially in domains with inconsistent capitalization or transcription errors, consider LLM-based NER. Your models will likely achieve better contextual entity recognition and robustness to noise than traditional statistical methods. Focus on refining LLM prompts to filter ambiguities and improve precision for specific entity types.
Key insights
LLMs offer superior linguistic sensitivity for NER in noisy interview transcripts compared to statistical methods.
Principles
- LLMs are robust to transcription errors.
- Statistical NER depends on capitalization.
Method
Compared a statistical-neural NER method against large language models using manual annotations of six Brazilian TV interviews to evaluate performance and qualitative distinctions.
In practice
- Use LLMs for NER on noisy, uncapitalized text.
- Refine LLM output with filtering instructions.
Topics
- Named Entity Recognition
- Large Language Models
- Brazilian TV Transcripts
- Formula One Drivers
- Roda Viva Program
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.