When LLMs Answer the Wrong Question

2026-01-11 · Source: Agus’s Substack · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, short

Summary

This article introduces "Drift," a new metric for evaluating the groundedness of AI-generated answers, specifically addressing the gap between topical relevance and relational accuracy. While traditional cosine similarity effectively measures semantic direction and shared vocabulary, it often fails to detect when an answer maintains keywords but alters the underlying structural relationship (e.g., causal, logical, conditional) implied by a query and its context. Drift, based on geometric algebra, quantifies this relational misalignment by comparing bivectors formed by the query-context and query-answer pairs. A runnable Python demo using the `all-MiniLM-L6-v2` SentenceTransformer model illustrates how Drift identifies answers that are topically similar but structurally unfaithful, such as replacing a causal explanation with generic business platitudes, even when cosine similarity remains high. This Type IV metric complements earlier groundedness checks by focusing on semantic closeness without relational faithfulness.

Key takeaway

For NLP Engineers building grounded QA systems, relying solely on cosine similarity for answer evaluation is insufficient. You should integrate the "Drift" metric, based on geometric algebra, to ensure answers not only stay on topic but also preserve the specific structural relationships (e.g., causal, logical) implied by the query and context. This will help you catch subtle relational failures that traditional similarity metrics miss, leading to more accurate and trustworthy AI responses.

Key insights

Drift, using geometric algebra, measures relational faithfulness in AI answers, distinguishing it from mere topical relevance.

Principles

Topical relevance differs from relational correctness.
Meaning involves both direction and structure.
Bivectors represent oriented planes of relationships.

Method

Embed query, context, and answer as vectors. Compute bivectors for (query, context) and (query, answer). Calculate `plane_cosine` between these bivectors, then `drift = 1 - plane_cosine`.

In practice

Use `plane_cosine` to detect relational misalignment.
Apply `drift` to evaluate grounded QA systems.
Integrate `all-MiniLM-L6-v2` for embeddings.

Topics

Geometric Algebra
Grounded QA
Semantic Drift
Cosine Similarity
Bivectors

Best for: NLP Engineer, Research Scientist, AI Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Agus’s Substack.