Where to Place the Query? Unveiling and Mitigating Positional Bias in In-Context Learning for Diffusion LLMs via Decoding Dynamics
Summary
Research into In-Context Learning (ICL) for Diffusion Large Language Models (dLLMs) like LLaDA-8B-Base and Dream-7B-Base reveals that query position is a first-order variable. Unlike Autoregressive LLMs, dLLMs' bidirectional attention offers spatial flexibility, yet current practices often use AR-style trailing queries. Empirical analysis shows positional variance impacts generation quality comparably to example semantic quality, with a relative importance ratio r=1.236 on GSM8K. Optimal placement is task-dependent: sequential reasoning (GSM8K) favors trailing, while global perception (Sudoku) prefers prefix. This sensitivity stems from a spatial "Recency Effect" and task-dependent "Decoding Trajectories." To mitigate this, the paper proposes Average Confidence (C̄), a novel metric tracking iterative decoding, and Auto-ICL, a training-free adaptive routing strategy. Auto-ICL dynamically optimizes query placement, robustly approaching oracle performance across tasks like GSM8K, MATH, MBPP, Sudoku, and Countdown, with marginal inference latency.
Key takeaway
For Machine Learning Engineers optimizing In-Context Learning for Diffusion LLMs, you must move beyond static, AR-style trailing query placements. Dynamically routing your query based on task type can significantly improve performance, especially for global-perception tasks like Sudoku which benefit from prefix placement, or under constrained generation budgets. Implement the Auto-ICL strategy using Average Confidence (C̄) to adaptively find the optimal query topology, ensuring robust and efficient model performance across diverse reasoning and perception challenges.
Key insights
Query placement is a first-order variable in dLLM In-Context Learning, impacting performance as much as example selection.
Principles
- dLLM query position is a first-order variable.
- Optimal query placement is task-dependent.
- Recency Effect governs attention flow.
Method
Auto-ICL dynamically routes queries by enumerating candidate positions, running dLLM decoding passes to compute Average Confidence (C̄), and selecting the placement maximizing generation stability.
In practice
- Evaluate prefix, middle, and trailing query positions.
- Use Average Confidence (C̄) for dLLM stability.
- Prioritize trailing for sequential tasks (e.g., GSM8K).
Topics
- Diffusion LLMs
- In-Context Learning
- Query Placement Optimization
- Positional Bias Mitigation
- Average Confidence Metric
- Decoding Dynamics
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.