Explicit Evidence Grounding via Structured Inline Citation Generation
Summary
FullCite, a new framework, addresses the critical need for factual and faithful AI generation by introducing structured inline citations. Unlike prior methods, FullCite links each generated claim to its specific source document and supporting evidence. It employs three distinct strategies for inline citation generation: prompt-based generation, constrained decoding using a citation grammar, and posthoc span alignment. The framework was evaluated across three question answering benchmarks: ASQA, BioASQ, and ExpertQA. Assessment focused on document-level correctness, evidence span identification, and claim-citation faithfulness. The evaluation revealed that while Large Language Models (LLMs) effectively identify relevant source documents, they consistently struggle with pinpointing the precise supporting spans within those documents. This highlights a significant gap, indicating that future research must prioritize accurate evidence span identification to achieve truly faithful attributed question answering.
Key takeaway
For NLP Engineers developing factual generation or question answering systems, you must prioritize robust evidence span identification. While Large Language Models effectively find relevant documents, their struggle with precise span attribution means your systems risk generating unfaithful claims. Implement techniques like constrained decoding or posthoc alignment to improve citation granularity, ensuring your AI output is verifiably grounded.
Key insights
LLMs excel at document retrieval but struggle with precise evidence span identification for faithful inline citations.
Principles
- Factual AI requires structured inline citations.
- Attribution needs source and specific evidence.
- Precise span identification is a key challenge.
Method
FullCite generates structured inline citations using prompt-based generation, constrained decoding with a citation grammar, and posthoc span alignment to link claims to sources and evidence.
In practice
- Evaluate citation quality via span identification.
- Implement constrained decoding for attribution.
- Focus research on precise evidence extraction.
Topics
- Evidence Grounding
- Inline Citation Generation
- Large Language Models
- Question Answering
- Factual Generation
- Span Identification
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.