Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

Macro is a novel preference alignment framework designed to enhance multilingual self-generated counterfactual explanations (SCEs) for large language models (LLMs). Addressing the challenges of generating valid SCEs in non-English languages and the inherent trade-off between explanation validity and minimality, Macro applies Direct Preference Optimization (DPO). It utilizes a composite scoring function to create preference pairs, effectively translating the validity-minimality balance into measurable signals. Across experiments involving four LLMs and seven typologically diverse languages, Macro demonstrated a 12.55% average improvement in validity compared to the chain-of-thought baseline, without compromising minimality. It also avoided the severe minimality violations seen with translation-based baselines and outperformed supervised fine-tuning on both metrics, confirming the importance of explicit preference optimization. Macro further increases cross-lingual perturbation alignment and reduces common generation errors.

Key takeaway

For machine learning engineers developing multilingual LLM explanation systems, Macro demonstrates that applying Direct Preference Optimization is crucial for overcoming the persistent validity-minimality trade-off. You should consider integrating preference alignment frameworks, particularly DPO with carefully designed composite scoring functions, to significantly improve the quality and cross-lingual consistency of your counterfactual explanations. This approach can yield more reliable insights into black-box LLM behavior across diverse languages.

Key insights

Preference alignment, specifically DPO, effectively balances validity and minimality in multilingual counterfactual explanation generation for LLMs.

Principles

Method

Macro applies Direct Preference Optimization (DPO) to multilingual SCE generation. It constructs preference pairs using a composite scoring function that translates the validity-minimality trade-off into measurable signals for alignment.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.