When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors
Summary
Large language models (LLMs) frequently exhibit data referencing errors (DREs) when handling table tasks, either incorrectly citing or omitting table values despite understanding the structure. This research provides the first systematic evaluation of DREs, confirming their presence across all tested models, ranging from 1.7B to 20B parameters. A key finding is that integrating data referencing as a critic significantly enhances answer accuracy by up to 12.0% through critic-based filtering and rejection sampling. Additionally, a lightweight 4B-parameter critic model was developed, achieving an average F1 score of 78.2% in detecting both in-distribution and out-of-distribution DREs, proving effective in assisting larger models during inference.
Key takeaway
For Machine Learning Engineers deploying LLMs on tabular data tasks, recognizing and addressing data referencing errors (DREs) is crucial for reliability. You should consider integrating a lightweight critic model to validate LLM outputs, as this approach can improve answer accuracy by up to 12.0% and enhance overall system trustworthiness. Proactively detecting DREs prevents the propagation of incorrect information.
Key insights
LLMs make data referencing errors in table tasks, which a critic model can effectively mitigate.
Principles
- Data referencing errors are prevalent across LLM sizes.
- Critic-based validation improves LLM accuracy.
Method
Systematically evaluate DREs across models. Incorporate a data referencing critic for filtering and rejection sampling. Train a lightweight critic model for detection.
In practice
- Implement a critic to detect DREs in LLM table outputs.
- Use critic feedback to refine LLM answers.
Topics
- Large Language Models
- Data Referencing Errors
- Tabular Data
- Critic Models
- LLM Evaluation
- Rejection Sampling
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.