Using Public Taxonomies

· Source: The Ontologist · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, long

Summary

The article explores the evolving role of public taxonomies and Large Language Models (LLMs) in Named Entity Recognition (NER) and data classification, addressing the historical challenge of definitively differentiating entity types. It highlights a growing need for common identifiers across organizations, driving investment in public IRI knowledge bases. The author demonstrates how LLMs, specifically Claude, can effectively retrieve and convert public IRIs (e.g., for Barack Obama from Wikidata, DBpedia, VIAF) into structured RDF descriptions using `schema.org` vocabulary. This approach facilitates the traversal of linked data across different knowledge bases, offering a superior method for entity resolution compared to prior techniques. The article also illustrates the application of this method beyond people to chemical entities like Benzyl Propionate and geographic entities using the GeoNames taxonomy, emphasizing its utility for integrating public taxonomies into knowledge graphs.

Key takeaway

For AI Architects and Data Engineers building knowledge graphs, integrating public IRIs via LLMs into your ingestion pipeline can significantly streamline master data management and improve entity resolution. This approach, especially for frequently referenced public concepts, reduces the need to stand up internal knowledge graphs for common reference data. You should prioritize querying LLMs for existing public IRIs and incorporating them using `owl:sameAs` or `schema:isBasedOn` to enhance data consistency and interoperability.

Key insights

LLMs can effectively integrate public taxonomies and IRIs into knowledge graphs for enhanced Named Entity Recognition and data classification.

Principles

Method

Identify reference standards (e.g., Wikidata IRI), query an LLM to resolve the entity and convert the result into a known ontology (e.g., `schema.org`), then store the IRIs in a knowledge graph using `schema:isBasedOn` or `owl:sameAs`.

In practice

Topics

Best for: AI Architect, Data Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Ontologist.