Why I Stopped Letting LLMs Build My Knowledge Graphs (And What I Did Instead)
Summary
The author details a shift from LLM-driven knowledge graph construction to a Fixed Entity Architecture (FEA) for an enterprise code migration platform, after experiencing issues like noisy entities, hallucinations, high LLM costs, and poor graph quality. Inspired by Dr. Irina Adamchic's work from late 2024 and early 2025, the FEA approach defines a stable, human-curated ontology (Layer 1) of domain concepts, connects document content (Layer 2, e.g., code chunks) via cosine similarity, and extracts NLP-based entities (Layer 3). For code, the author adapted FEA using Hypothetical Document Embeddings (HyDE) to bridge the semantic gap between natural language concepts and code embeddings, improving mean cosine similarity from ~0.09 to ~0.30+. This method significantly reduced costs, eliminated entity duplication, and improved domain accuracy and graph quality compared to LLM-centric approaches like Microsoft's GraphRAG.
Key takeaway
For AI Engineers building Graph RAG systems in well-defined domains, consider adopting a Fixed Entity Architecture (FEA) to improve graph quality and reduce LLM costs. Define your domain ontology manually and use cosine similarity for connections, especially with HyDE for code or structured data. This approach yields cleaner, more accurate graphs and a more maintainable system, outperforming LLM-centric graph construction for specific use cases.
Key insights
Fixed Entity Architecture (FEA) with HyDE improves Graph RAG quality and cost by replacing LLM-based graph construction with expert-defined ontologies and similarity matching.
Principles
- Define ontology if domain is known.
- Math-based connections are reliable.
- Curate ontology aggressively.
Method
FEA uses three layers: a fixed entity ontology, a document layer connected via cosine similarity, and NLP-extracted entities. HyDE generates hypothetical code snippets for concepts to bridge semantic gaps.
In practice
- Use HyDE for non-textual content.
- Combine vector, full-text, and concept-guided search.
- Exclude super-nodes from ontology.
Topics
- Fixed Entity Architecture
- Graph RAG
- Knowledge Graphs
- Hypothetical Document Embeddings
- Code Migration
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Naturallanguageprocessing on Medium.