SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Expert, extended

Summary

SKG-Eval is a novel, quasi-deterministic, and interpretable evaluation framework for multi-turn dialogue systems that addresses the limitations of existing methods, which often fail to detect cross-turn inconsistencies like contradiction, topic drift, and entity inconsistency. It models dialogue as an evolving Semantic Knowledge Graph (SKG) by incrementally updating entities, relations, and commitments at each turn via structured triple extraction. The framework computes three signals: local relevance, historical consistency, and logical coherence, which are then fused using a regime-adaptive mechanism and aggregated into a length-invariant session score via recency-weighted trend analysis. SKG-Eval achieves higher correlation with human judgments and significantly improves the recall of long-range inconsistencies on benchmarks like MT-Bench and MultiChallenge, particularly in extended conversations where other evaluators degrade. It also provides explicit contradiction certificates and deterministic scores, enabling reproducible and auditable evaluation.

Key takeaway

For research scientists developing or evaluating multi-turn dialogue systems, you should consider adopting SKG-Eval to overcome the limitations of turn-isolated or LLM-as-a-judge evaluation. This framework offers superior detection of long-range inconsistencies like contradictions and semantic drift, providing auditable, deterministic results. Integrating SKG-Eval can lead to more robust model development by surfacing critical failure modes that implicit, black-box evaluators often miss, especially in extended conversations.

Key insights

SKG-Eval uses evolving knowledge graphs and geometric reasoning for stateful, interpretable multi-turn dialogue evaluation.

Principles

Method

SKG-Eval incrementally builds a Semantic Knowledge Graph, extracts local relevance, historical consistency, and logical coherence signals, then fuses and aggregates them into a session score with recency weighting.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.