CREDENCE: Claim Reduction for Decomposition & Enhanced Credibility -- Semantic Metrics and Convergence Analysis
Summary
Credence is a new framework for decomposing compound sentences into atomic, verifiable claims, crucial for automated fact-checking. It addresses limitations of prior methods, which relied on token-overlap (Jaccard) metrics that underestimated paraphrastic claim quality and lacked formal termination analysis for repair loops. Credence introduces Semantic-F1, a BGE-large cosine similarity fidelity metric, which improves downstream fact-checking accuracy by +15-32pp over Jaccard-F1. The framework also provides convergence theorems, formally characterizing rule-based repair as monotone and finitely terminating, and LLM-based self-repair as non-monotone, requiring an early-exit guard. Evaluated across three benchmarks (SocialClaimSplit, WikiSplitBench, ClaimDecompBench) and four decomposer models (3.8B-12B), Credence demonstrates robust performance, with rule-repair reducing Atomicity Violation Rate by 47-100% without degrading fidelity.
Key takeaway
For NLP Engineers developing automated fact-checking systems, your current reliance on token-overlap metrics like Jaccard-F1 for claim decomposition quality may be significantly underestimating performance, especially with paraphrastic claims. You should integrate semantic similarity metrics, such as Credence's Semantic-F1, to accurately assess decomposition fidelity. Furthermore, when designing iterative repair pipelines, formally characterize their convergence properties and implement early-exit guards for LLM-based self-repair to ensure reliability and termination.
Key insights
Credence improves automated fact-checking by semantically evaluating decomposed claims and formalizing repair loop convergence.
Principles
- Semantic similarity metrics are crucial for paraphrastic claim evaluation.
- Formal termination analysis is vital for iterative repair pipelines.
- LLM-based self-repair requires early-exit mechanisms.
Method
Credence uses a BGE-large cosine similarity (Semantic-F1) for claim fidelity, combined with rule-based or LLM-based repair loops, formally analyzed for convergence and atomicity.
In practice
- Use Semantic-F1 for claim decomposition evaluation.
- Implement early-exit guards for LLM-based repair.
- Apply rule-based repair to reduce atomicity violations.
Topics
- Claim Decomposition
- Automated Fact-Checking
- Semantic-F1 Metric
- LLM Repair Pipelines
- Evaluation Benchmarks
- Convergence Analysis
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.