ATLAS: Article Tracking, Linking, and Analysis of Swedish Encyclopedias

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

ATLAS (Article Tracking, Linking, and Analysis of Swedish Encyclopedias) is a new pipeline designed to restore and exploit the underlying structure of digitized historical encyclopedias. This system extracts headwords, identifies entries, categorizes entities, matches entries across different editions, and links them to Wikidata items. The pipeline was applied to the four major editions of "Nordisk familjebok," a prominent Swedish encyclopedia published from 1876 to 1951. Evaluation showed a 97.8% F1 score for headword extraction and a 93.4% F1 score for headword classification. Cross-edition matching achieved 93% precision, while Wikidata linking reached 85% precision and 16.5% recall on a small-scale evaluation. The project demonstrates the feasibility of automated processing for digitized historical knowledge, with datasets and programs made publicly available.

Key takeaway

For digital humanities researchers or archivists working with historical texts, ATLAS demonstrates a robust method for transforming raw OCR output into structured, linkable knowledge. You should consider implementing similar pipelines to unlock the full potential of digitized encyclopedias, enabling deeper analysis of knowledge evolution and transmission across different editions. This approach facilitates better preservation and accessibility of historical information.

Key insights

Automated pipelines can effectively restore and link structured knowledge from digitized historical encyclopedias.

Principles

Method

The ATLAS pipeline involves headword extraction, entry identification, entity categorization, cross-edition matching, and Wikidata linking to restore and structure digitized encyclopedia content.

In practice

Topics

Best for: AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.