Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLM Decisions via Attribution Guidance

2026-04-15 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new training-free method, "Faithfulness Serum," has been introduced to enhance the epistemic faithfulness of text-based explanations generated by Large Language Models (LLMs). While LLMs have revolutionized NLP, their black-box nature limits adoption in transparency-demanding domains. Prior research focused on subjectively convincing rationales, but their epistemic faithfulness—whether explanations reflect the model's actual internal evidence—remained unverified. This work first assesses existing LLM-generated explanations using counterfactuals, revealing frequent unfaithfulness. The proposed method improves faithfulness by guiding explanation generation through attention-level interventions, leveraging token-level heatmaps derived from a faithful attribution technique. This approach significantly boosts epistemic faithfulness across various LLMs, benchmarks, and prompts.

Key takeaway

For research scientists developing or deploying LLMs in sensitive applications, you should prioritize epistemic faithfulness over subjective convincingness in explanations. The "Faithfulness Serum" method offers a practical, training-free approach to improve the reliability of LLM explanations, making your models more trustworthy and transparent. Consider integrating attribution-guided generation to ensure explanations truly reflect model decisions.

Key insights

LLM-generated explanations often lack epistemic faithfulness, but a new method improves it via attention-level guidance.

Principles

Subjective faithfulness does not imply epistemic faithfulness.
Attribution heatmaps can guide explanation generation.

Method

The method enhances faithfulness by guiding explanation generation through attention-level interventions, informed by token-level heatmaps extracted via a faithful attribution method, without requiring additional training.

In practice

Assess LLM explanation faithfulness via counterfactuals.
Use token-level heatmaps for explanation guidance.

Topics

Large Language Models
Explainable AI
Epistemic Faithfulness
Attribution Guidance
Attention-level Interventions

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.