Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, short

Summary

A study benchmarked three frontier Large Language Models (LLMs) – Claude Opus 4.7, Claude Sonnet 4.6, and GPT-5.4 – on their ability to answer 100 natural-language questions over the Cleaned Contoso Retail Dataset in ClickHouse. The research specifically investigated the impact of providing LLMs with explicit business semantics, delivered via a 4 KB hand-authored markdown document, in addition to the database schema. Results showed that supplying this semantic layer significantly improved accuracy by +17 to +23 percentage points across all models. With the semantic document, all three models performed statistically indistinguishably, achieving 67.7-68.7% accuracy; without it, their accuracy ranged from 45.5-50.5%, also statistically indistinguishable. This indicates that explicit business semantics, rather than model choice within a tier, account for the significant variance in performance, by changing the nature of the task for the LLM.

Key takeaway

For AI Architects and NLP Engineers building natural-language interfaces for analytical databases, your primary focus should be on creating robust semantic layers. Providing explicit business semantics, such as a markdown document detailing measures and conventions, is far more impactful for improving LLM accuracy and reducing hallucinations than selecting a specific frontier model. Prioritize developing comprehensive semantic documentation to ensure reliable and accurate data analytics.

Key insights

Explicit business semantics significantly improve LLM accuracy and reduce hallucination in natural-language data querying.

Principles

Method

Benchmarked three LLMs on 100 natural-language questions over a retail dataset, using a paired single-shot protocol to compare performance with and without a 4 KB semantic markdown document.

In practice

Topics

Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.