Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLM Decisions via Attribution Guidance
Summary
A new training-free method, "Faithfulness Serum," has been introduced to enhance the epistemic faithfulness of text-based explanations generated by Large Language Models (LLMs). While LLMs have revolutionized NLP, their black-box nature limits adoption in transparency-demanding domains. Prior research focused on subjectively convincing rationales, but their epistemic faithfulness—whether explanations reflect the model's actual internal evidence—remained unverified. This work first assesses existing LLM-generated explanations using counterfactuals, revealing frequent unfaithfulness. The proposed method improves faithfulness by guiding explanation generation through attention-level interventions, leveraging token-level heatmaps derived from a faithful attribution technique. This approach significantly boosts epistemic faithfulness across various LLMs, benchmarks, and prompts.
Key takeaway
For research scientists developing or deploying LLMs in sensitive applications, you should prioritize epistemic faithfulness over subjective convincingness in explanations. The "Faithfulness Serum" method offers a practical, training-free approach to improve the reliability of LLM explanations, making your models more trustworthy and transparent. Consider integrating attribution-guided generation to ensure explanations truly reflect model decisions.
Key insights
LLM-generated explanations often lack epistemic faithfulness, but a new method improves it via attention-level guidance.
Principles
- Subjective faithfulness does not imply epistemic faithfulness.
- Attribution heatmaps can guide explanation generation.
Method
The method enhances faithfulness by guiding explanation generation through attention-level interventions, informed by token-level heatmaps extracted via a faithful attribution method, without requiring additional training.
In practice
- Assess LLM explanation faithfulness via counterfactuals.
- Use token-level heatmaps for explanation guidance.
Topics
- Large Language Models
- Explainable AI
- Epistemic Faithfulness
- Attribution Guidance
- Attention-level Interventions
Best for: Research Scientist, AI Scientist, NLP Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.