When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, quick

Summary

Large language models (LLMs) frequently exhibit data referencing errors (DREs) when handling table tasks, either incorrectly citing or omitting table values despite understanding the structure. This research provides the first systematic evaluation of DREs, confirming their presence across all tested models, ranging from 1.7B to 20B parameters. A key finding is that integrating data referencing as a critic significantly enhances answer accuracy by up to 12.0% through critic-based filtering and rejection sampling. Additionally, a lightweight 4B-parameter critic model was developed, achieving an average F1 score of 78.2% in detecting both in-distribution and out-of-distribution DREs, proving effective in assisting larger models during inference.

Key takeaway

For Machine Learning Engineers deploying LLMs on tabular data tasks, recognizing and addressing data referencing errors (DREs) is crucial for reliability. You should consider integrating a lightweight critic model to validate LLM outputs, as this approach can improve answer accuracy by up to 12.0% and enhance overall system trustworthiness. Proactively detecting DREs prevents the propagation of incorrect information.

Key insights

LLMs make data referencing errors in table tasks, which a critic model can effectively mitigate.

Principles

Method

Systematically evaluate DREs across models. Incorporate a data referencing critic for filtering and rejection sampling. Train a lightweight critic model for detection.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.