Language is the Bridge
Summary
The article emphasizes that language is the fundamental "substrate and bridge" for both Large Language Models (LLMs) and ontologies, asserting that stripping away linguistic scaffolding renders these systems unintelligible and unactionable. It references the 2020 "octopus paper" by Emily Bender and Alexander Koller, which illustrates how systems trained only on linguistic form produce "fluent nonsense" without real-world grounding. The author extends this concept to ontologies, arguing that an internally valid OWL ontology is "externally inert" if its terms are not bound to controlled vocabularies or taxonomies reflecting actual organizational usage. The piece highlights that labels are crucial for human and machine interoperability, citing Wikidata's enforcement of unique label-description combinations and the OOPS! pitfall scanner's identification of "missing annotations" as a common issue. Ultimately, the "Ontology Pipeline™" is presented as an engineering framework for building shared linguistic agreement across controlled vocabularies, taxonomies, thesauri, ontologies, and knowledge graphs to bridge the "knowing-doing gap" in organizations.
Key takeaway
For executives overseeing data strategy and knowledge management, recognize that the "knowing-doing gap" in your organization is fundamentally a language gap. Prioritize investment in the "Ontology Pipeline™" by establishing controlled vocabularies, taxonomies, and thesauri before formalizing ontologies or knowledge graphs. This foundational work ensures shared understanding, enabling coordinated action and preventing costly decision paralysis stemming from ambiguous terminology.
Key insights
Shared language and explicit linguistic scaffolding are essential for making LLMs and ontologies actionable and trustworthy.
Principles
- Form is not meaning; systems need grounding in real-world context.
- Labels are the human interoperability layer for semantic systems.
- Vocabulary control is a precondition for semantic operations.
Method
The Ontology Pipeline™ systematically builds shared linguistic agreement through layered structures: controlled vocabularies, taxonomies, thesauri, ontologies, and knowledge graphs, ensuring terms are bound to human practice.
In practice
- Integrate retrieval augmentation and tool use for LLM grounding.
- Enforce unique label-description combinations in knowledge graphs.
- Prioritize "alt labels" to resolve vocabulary mismatches.
Topics
- Language Models
- Ontologies
- Knowledge Graphs
- Semantic Interoperability
- Ontology Pipeline
Code references
Best for: Executive, AI Architect, Director of AI/ML, Consultant
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Intentional Arrangement.