Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
Summary
The paper "Interpretable Discriminative Text Representations via Agreement and Label Disentanglement" introduces a novel operational criterion and method, LLM-assisted Feature Discovery (LFD), for creating text representations that are both predictive and auditable. Traditional discriminative representations often use anonymous embeddings, while LLM-assisted methods struggle with feature reproducibility and distinctness from target labels. The proposed criterion emphasizes conceptual clarity, measured by chance-adjusted agreement (Cohen's κ) among independent annotators, and label disentanglement, ensuring features do not simply paraphrase the prediction target. LFD iteratively proposes features from contrastive text pairs, screens them using cross-LLM Cohen's κ, and selects based on predictive gain. Across ten text-classification tasks spanning seven corpora, LFD achieved predictive performance comparable to a strong baseline, yielding clearer and less label-entangled features. Human audits with 232 raters confirmed higher human-human and human-LLM agreement for LFD features, which were also judged as less label-leaking.
Key takeaway
For NLP Engineers developing interpretable text classification models, you should prioritize features that demonstrate high annotator agreement and clear label disentanglement. Adopting the LLM-assisted Feature Discovery (LFD) method can help you generate and validate features that are both predictive and auditable. This approach ensures your models meet practical auditability standards, enhancing trust and transparency in your AI systems.
Key insights
Interpretable text representations require features with high annotator agreement and clear disentanglement from prediction targets for practical auditability.
Principles
- Conceptual clarity needs chance-adjusted annotator agreement.
- Features must be disentangled from the target label.
- Iterative feature discovery improves interpretability.
Method
LLM-assisted Feature Discovery (LFD) iteratively proposes lexical/semantic features from contrastive text pairs, screens candidates via cross-LLM Cohen's κ, and selects features based on residual predictive gain.
In practice
- Use Cohen's κ for feature definition reliability.
- Design features distinct from target labels.
- Employ LLMs for iterative feature generation.
Topics
- Interpretable AI
- Text Classification
- LLM-assisted Feature Discovery
- Feature Engineering
- Label Disentanglement
- Cohen's Kappa
Code references
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.