Why I Stopped Letting LLMs Build My Knowledge Graphs (And What I Did Instead)

2026-02-09 · Source: Naturallanguageprocessing on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, medium

Summary

The author details a shift from LLM-driven knowledge graph construction to a Fixed Entity Architecture (FEA) for an enterprise code migration platform, after experiencing issues like noisy entities, hallucinations, high LLM costs, and poor graph quality. Inspired by Dr. Irina Adamchic's work from late 2024 and early 2025, the FEA approach defines a stable, human-curated ontology (Layer 1) of domain concepts, connects document content (Layer 2, e.g., code chunks) via cosine similarity, and extracts NLP-based entities (Layer 3). For code, the author adapted FEA using Hypothetical Document Embeddings (HyDE) to bridge the semantic gap between natural language concepts and code embeddings, improving mean cosine similarity from ~0.09 to ~0.30+. This method significantly reduced costs, eliminated entity duplication, and improved domain accuracy and graph quality compared to LLM-centric approaches like Microsoft's GraphRAG.

Key takeaway

For AI Engineers building Graph RAG systems in well-defined domains, consider adopting a Fixed Entity Architecture (FEA) to improve graph quality and reduce LLM costs. Define your domain ontology manually and use cosine similarity for connections, especially with HyDE for code or structured data. This approach yields cleaner, more accurate graphs and a more maintainable system, outperforming LLM-centric graph construction for specific use cases.

Key insights

Fixed Entity Architecture (FEA) with HyDE improves Graph RAG quality and cost by replacing LLM-based graph construction with expert-defined ontologies and similarity matching.

Principles

Define ontology if domain is known.
Math-based connections are reliable.
Curate ontology aggressively.

Method

FEA uses three layers: a fixed entity ontology, a document layer connected via cosine similarity, and NLP-extracted entities. HyDE generates hypothetical code snippets for concepts to bridge semantic gaps.

In practice

Use HyDE for non-textual content.
Combine vector, full-text, and concept-guided search.
Exclude super-nodes from ontology.

Topics

Fixed Entity Architecture
Graph RAG
Knowledge Graphs
Hypothetical Document Embeddings
Code Migration

Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Naturallanguageprocessing on Medium.