Semantic marking of named entities in The Lusiads

· Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Digital Humanities · Depth: Advanced, quick

Summary

This article details the semantic modeling of named entities within Luís de Camões's epic poem, "The Lusiads," utilizing the TEI P5 standard. A hybrid annotation workflow is introduced, integrating spaCy's Named Entity Recognition (NER) capabilities, an authority dictionary (gazetteer), and subsequent manual philological post-editing. The study specifically typifies anthroponyms, mythonyms, and toponyms using the <persName>, <placeName>, and <rs> (referencing string) elements, with particular emphasis on the accurate tagging of epithets. The research identifies significant limitations in NER models trained on journalistic corpora when applied to the distinct epic syntax and 1572 orthography of "The Lusiads," underscoring the necessity of the proposed hybrid methodology. Ultimately, the work concludes that XML/TEI serves as an effective tool for modeling literary knowledge.

Key takeaway

For NLP engineers working with historical or specialized literary texts, you should anticipate significant performance drops from models trained on modern journalistic corpora. Your best approach will involve a hybrid methodology, combining automated NER with domain-specific gazetteers and expert manual post-editing, to achieve accurate semantic tagging of entities like anthroponyms and toponyms. This ensures fidelity to the original text's orthography and syntax.

Key insights

Hybrid NER workflows improve accuracy for historical literary texts, overcoming limitations of models trained on modern corpora.

Principles

Method

A hybrid annotation workflow combines spaCy NER, an authority dictionary (gazetteer), and manual philological post-editing to semantically tag named entities in historical texts.

In practice

Topics

Best for: NLP Engineer, AI Scientist, Research Scientist, Domain Expert

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.