Semantic marking of named entities in The Lusiads
Summary
This article details the semantic modeling of named entities within Luís de Camões's epic poem, "The Lusiads," utilizing the TEI P5 standard. A hybrid annotation workflow is introduced, integrating spaCy's Named Entity Recognition (NER) capabilities, an authority dictionary (gazetteer), and subsequent manual philological post-editing. The study specifically typifies anthroponyms, mythonyms, and toponyms using the <persName>, <placeName>, and <rs> (referencing string) elements, with particular emphasis on the accurate tagging of epithets. The research identifies significant limitations in NER models trained on journalistic corpora when applied to the distinct epic syntax and 1572 orthography of "The Lusiads," underscoring the necessity of the proposed hybrid methodology. Ultimately, the work concludes that XML/TEI serves as an effective tool for modeling literary knowledge.
Key takeaway
For NLP engineers working with historical or specialized literary texts, you should anticipate significant performance drops from models trained on modern journalistic corpora. Your best approach will involve a hybrid methodology, combining automated NER with domain-specific gazetteers and expert manual post-editing, to achieve accurate semantic tagging of entities like anthroponyms and toponyms. This ensures fidelity to the original text's orthography and syntax.
Key insights
Hybrid NER workflows improve accuracy for historical literary texts, overcoming limitations of models trained on modern corpora.
Principles
- Combine automated NER with philological expertise.
- Adapt annotation standards (TEI P5) for literary specifics.
Method
A hybrid annotation workflow combines spaCy NER, an authority dictionary (gazetteer), and manual philological post-editing to semantically tag named entities in historical texts.
In practice
- Use TEI P5 for literary knowledge modeling.
- Prioritize epithet tagging in historical texts.
Topics
- Semantic Tagging
- Named Entity Recognition
- TEI P5 Standard
- The Lusiads
- Philological Post-editing
Best for: NLP Engineer, AI Scientist, Research Scientist, Domain Expert
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.