Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization

2026-05-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

Macro is a novel preference alignment framework designed to enhance multilingual self-generated counterfactual explanations (SCEs) for large language models (LLMs). Addressing the challenges of generating valid SCEs in non-English languages and the inherent trade-off between explanation validity and minimality, Macro applies Direct Preference Optimization (DPO). It utilizes a composite scoring function to create preference pairs, effectively translating the validity-minimality balance into measurable signals. Across experiments involving four LLMs and seven typologically diverse languages, Macro demonstrated a 12.55% average improvement in validity compared to the chain-of-thought baseline, without compromising minimality. It also avoided the severe minimality violations seen with translation-based baselines and outperformed supervised fine-tuning on both metrics, confirming the importance of explicit preference optimization. Macro further increases cross-lingual perturbation alignment and reduces common generation errors.

Key takeaway

For machine learning engineers developing multilingual LLM explanation systems, Macro demonstrates that applying Direct Preference Optimization is crucial for overcoming the persistent validity-minimality trade-off. You should consider integrating preference alignment frameworks, particularly DPO with carefully designed composite scoring functions, to significantly improve the quality and cross-lingual consistency of your counterfactual explanations. This approach can yield more reliable insights into black-box LLM behavior across diverse languages.

Key insights

Preference alignment, specifically DPO, effectively balances validity and minimality in multilingual counterfactual explanation generation for LLMs.

Principles

Explicit preference optimization balances explanation trade-offs.
Composite scoring functions can translate complex trade-offs into preferences.
Cross-lingual perturbation alignment improves explanation quality.

Method

Macro applies Direct Preference Optimization (DPO) to multilingual SCE generation. It constructs preference pairs using a composite scoring function that translates the validity-minimality trade-off into measurable signals for alignment.

In practice

Apply DPO for balancing conflicting LLM generation objectives.
Design composite scoring functions for complex preference signals.
Evaluate explanation quality across diverse languages and LLMs.

Topics

Multilingual LLMs
Counterfactual Explanations
Direct Preference Optimization
Explainable AI
Model Alignment
Natural Language Generation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.