Interpretable Discriminative Text Representations via Agreement and Label Disentanglement

2026-05-20 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, medium

Summary

The paper "Interpretable Discriminative Text Representations via Agreement and Label Disentanglement" introduces a novel operational criterion and method, LLM-assisted Feature Discovery (LFD), for creating text representations that are both predictive and auditable. Traditional discriminative representations often use anonymous embeddings, while LLM-assisted methods struggle with feature reproducibility and distinctness from target labels. The proposed criterion emphasizes conceptual clarity, measured by chance-adjusted agreement (Cohen's κ) among independent annotators, and label disentanglement, ensuring features do not simply paraphrase the prediction target. LFD iteratively proposes features from contrastive text pairs, screens them using cross-LLM Cohen's κ, and selects based on predictive gain. Across ten text-classification tasks spanning seven corpora, LFD achieved predictive performance comparable to a strong baseline, yielding clearer and less label-entangled features. Human audits with 232 raters confirmed higher human-human and human-LLM agreement for LFD features, which were also judged as less label-leaking.

Key takeaway

For NLP Engineers developing interpretable text classification models, you should prioritize features that demonstrate high annotator agreement and clear label disentanglement. Adopting the LLM-assisted Feature Discovery (LFD) method can help you generate and validate features that are both predictive and auditable. This approach ensures your models meet practical auditability standards, enhancing trust and transparency in your AI systems.

Key insights

Interpretable text representations require features with high annotator agreement and clear disentanglement from prediction targets for practical auditability.

Principles

Conceptual clarity needs chance-adjusted annotator agreement.
Features must be disentangled from the target label.
Iterative feature discovery improves interpretability.

Method

LLM-assisted Feature Discovery (LFD) iteratively proposes lexical/semantic features from contrastive text pairs, screens candidates via cross-LLM Cohen's κ, and selects features based on residual predictive gain.

In practice

Use Cohen's κ for feature definition reliability.
Design features distinct from target labels.
Employ LLMs for iterative feature generation.

Topics

Interpretable AI
Text Classification
LLM-assisted Feature Discovery
Feature Engineering
Label Disentanglement
Cohen's Kappa

Code references

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.