When LLMs Answer the Wrong Question
Summary
This article introduces "Drift," a new metric for evaluating the groundedness of AI-generated answers, specifically addressing the gap between topical relevance and relational accuracy. While traditional cosine similarity effectively measures semantic direction and shared vocabulary, it often fails to detect when an answer maintains keywords but alters the underlying structural relationship (e.g., causal, logical, conditional) implied by a query and its context. Drift, based on geometric algebra, quantifies this relational misalignment by comparing bivectors formed by the query-context and query-answer pairs. A runnable Python demo using the `all-MiniLM-L6-v2` SentenceTransformer model illustrates how Drift identifies answers that are topically similar but structurally unfaithful, such as replacing a causal explanation with generic business platitudes, even when cosine similarity remains high. This Type IV metric complements earlier groundedness checks by focusing on semantic closeness without relational faithfulness.
Key takeaway
For NLP Engineers building grounded QA systems, relying solely on cosine similarity for answer evaluation is insufficient. You should integrate the "Drift" metric, based on geometric algebra, to ensure answers not only stay on topic but also preserve the specific structural relationships (e.g., causal, logical) implied by the query and context. This will help you catch subtle relational failures that traditional similarity metrics miss, leading to more accurate and trustworthy AI responses.
Key insights
Drift, using geometric algebra, measures relational faithfulness in AI answers, distinguishing it from mere topical relevance.
Principles
- Topical relevance differs from relational correctness.
- Meaning involves both direction and structure.
- Bivectors represent oriented planes of relationships.
Method
Embed query, context, and answer as vectors. Compute bivectors for (query, context) and (query, answer). Calculate `plane_cosine` between these bivectors, then `drift = 1 - plane_cosine`.
In practice
- Use `plane_cosine` to detect relational misalignment.
- Apply `drift` to evaluate grounded QA systems.
- Integrate `all-MiniLM-L6-v2` for embeddings.
Topics
- Geometric Algebra
- Grounded QA
- Semantic Drift
- Cosine Similarity
- Bivectors
Best for: NLP Engineer, Research Scientist, AI Engineer, Machine Learning Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Agus’s Substack.