ATLAS: Article Tracking, Linking, and Analysis of Swedish Encyclopedias
Summary
ATLAS (Article Tracking, Linking, and Analysis of Swedish Encyclopedias) is a new pipeline designed to restore and exploit the underlying structure of digitized historical encyclopedias. This system extracts headwords, identifies entries, categorizes entities, matches entries across different editions, and links them to Wikidata items. The pipeline was applied to the four major editions of "Nordisk familjebok," a prominent Swedish encyclopedia published from 1876 to 1951. Evaluation showed a 97.8% F1 score for headword extraction and a 93.4% F1 score for headword classification. Cross-edition matching achieved 93% precision, while Wikidata linking reached 85% precision and 16.5% recall on a small-scale evaluation. The project demonstrates the feasibility of automated processing for digitized historical knowledge, with datasets and programs made publicly available.
Key takeaway
For digital humanities researchers or archivists working with historical texts, ATLAS demonstrates a robust method for transforming raw OCR output into structured, linkable knowledge. You should consider implementing similar pipelines to unlock the full potential of digitized encyclopedias, enabling deeper analysis of knowledge evolution and transmission across different editions. This approach facilitates better preservation and accessibility of historical information.
Key insights
Automated pipelines can effectively restore and link structured knowledge from digitized historical encyclopedias.
Principles
- Digitization alone is insufficient for knowledge exploitation.
- Structured extraction enhances historical knowledge access.
Method
The ATLAS pipeline involves headword extraction, entry identification, entity categorization, cross-edition matching, and Wikidata linking to restore and structure digitized encyclopedia content.
In practice
- Apply OCR output to structured knowledge extraction.
- Use cross-edition matching to track knowledge evolution.
- Link historical entries to modern knowledge bases.
Topics
- ATLAS Pipeline
- Swedish Encyclopedias
- Text Structure Restoration
- Cross-edition Matching
- Wikidata Linking
Best for: AI Scientist, Research Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.