Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new dataset called RedirectQA has been introduced to analyze non-verbatim memorization in large language models (LLMs), specifically focusing on how entity surface forms influence factual recall. Unlike previous methods that used single canonical names, RedirectQA leverages Wikipedia redirect information to link Wikidata factual triples with various surface forms, including aliases, abbreviations, spelling variants, and common errors. Researchers evaluated 13 LLMs using this dataset and found that prediction outcomes frequently change based solely on the entity's surface form. This inconsistency varies by category, with models showing more robustness to minor orthographic changes than to significant lexical variations like aliases. Frequency analysis indicates that both entity-level and surface-level frequencies correlate with accuracy, with entity frequency often having an impact beyond surface frequency, suggesting factual memorization is neither entirely surface-specific nor fully surface-invariant.

Key takeaway

For research scientists evaluating LLM reliability, you should incorporate diverse entity surface forms, including aliases and common errors, into your QA datasets. Relying solely on canonical names can obscure inconsistencies in factual recall, as models demonstrate varying robustness to different lexical and orthographic variations. Expanding your evaluation to include RedirectQA-like methodologies will provide a more accurate assessment of an LLM's true factual knowledge and its limitations.

Key insights

LLM factual recall is sensitive to entity surface forms, highlighting the need for diverse evaluation methods.

Principles

Method

RedirectQA uses Wikipedia redirects to associate Wikidata triples with categorized entity surface forms for evaluating LLM non-verbatim memorization.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.