Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms
Summary
A new dataset called RedirectQA has been introduced to analyze non-verbatim memorization in large language models (LLMs), specifically focusing on how entity surface forms influence factual recall. Unlike previous methods that used single canonical names, RedirectQA leverages Wikipedia redirect information to link Wikidata factual triples with various surface forms, including aliases, abbreviations, spelling variants, and common errors. Researchers evaluated 13 LLMs using this dataset and found that prediction outcomes frequently change based solely on the entity's surface form. This inconsistency varies by category, with models showing more robustness to minor orthographic changes than to significant lexical variations like aliases. Frequency analysis indicates that both entity-level and surface-level frequencies correlate with accuracy, with entity frequency often having an impact beyond surface frequency, suggesting factual memorization is neither entirely surface-specific nor fully surface-invariant.
Key takeaway
For research scientists evaluating LLM reliability, you should incorporate diverse entity surface forms, including aliases and common errors, into your QA datasets. Relying solely on canonical names can obscure inconsistencies in factual recall, as models demonstrate varying robustness to different lexical and orthographic variations. Expanding your evaluation to include RedirectQA-like methodologies will provide a more accurate assessment of an LLM's true factual knowledge and its limitations.
Key insights
LLM factual recall is sensitive to entity surface forms, highlighting the need for diverse evaluation methods.
Principles
- Factual memorization is not purely surface-specific.
- Entity frequency impacts recall beyond surface frequency.
Method
RedirectQA uses Wikipedia redirects to associate Wikidata triples with categorized entity surface forms for evaluating LLM non-verbatim memorization.
In practice
- Test LLMs with diverse entity surface forms.
- Include abbreviations and aliases in test sets.
Topics
- Large Language Models
- Non-Verbatim Memorization
- Entity Surface Forms
- RedirectQA Dataset
- Factual Knowledge Evaluation
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.