Explicit Evidence Grounding via Structured Inline Citation Generation

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

The FullCite framework introduces a novel approach to generating structured inline citations for large language model (LLM) outputs, aiming to enhance factual and faithful generation in high-stakes domains. Unlike prior methods, FullCite simultaneously links each generated claim to its source document and the precise supporting evidence span. The framework employs three distinct strategies: prompt-based generation, constrained decoding using a citation grammar, and posthoc span alignment, which reconstructs citations by finding the most similar snippet (Jaccard similarity 0.7). Evaluated on ASQA, BioASQ, and ExpertQA benchmarks using Qwen3-8B and Gemma3-12b-it LLMs, FullCite demonstrates that while LLMs effectively identify relevant documents (high Doc-F1), they struggle with precise evidence span localization (low Snippet-F1). The posthoc strategy significantly improved Snippet-F1, for instance, from 12.80 to 61.87 for Qwen3-8B on ASQA. The study also identified issues like primacy bias, where 81.8% of BioASQ citations targeted only the first two of five context documents, and citation omission on binary yes/no questions.

Key takeaway

For NLP Engineers and AI Scientists developing RAG-based QA systems, FullCite's findings highlight the critical need to prioritize precise evidence span identification over mere document-level retrieval. You should consider implementing posthoc span alignment techniques, such as those using Jaccard similarity, to significantly improve the accuracy of verbatim evidence citations. Additionally, be aware of LLM biases like "lost-in-the-middle" and citation omission for yes/no questions, and design your attribution mechanisms to explicitly mitigate these challenges for more faithful and transparent outputs.

Key insights

FullCite improves LLM attribution by jointly generating document and evidence-span citations, revealing LLMs struggle with precise span localization.

Principles

Method

FullCite uses prompt-based generation, constrained decoding via a finite-state automaton, or posthoc span alignment (Jaccard similarity 0.7) to generate structured inline citations linking claims to documents and verbatim evidence snippets.

In practice

Topics

Best for: Research Scientist, AI Engineer, Machine Learning Engineer, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.