Attention Calibration for Position-Fair Dense Information Retrieval
Summary
Dense retrieval models suffer from positional bias, where retrieval effectiveness decreases if relevant information appears later in a passage. Researchers adapted inference-time attention calibration, extending it with a strength coefficient lambda, to mitigate this bias without retraining or sacrificing overall effectiveness. Across three embedding models on SQuAD-PosQ and FineWeb-PosQ, they found that partial calibration frequently outperforms full calibration. A specific configuration (B=128, lambda=0.5, 50% layer depth) significantly improved the harmonic mean of nDCG@10 across positional groups on FineWeb-PosQ for all models without per-model tuning. This configuration also transferred successfully to PosIR, a dataset spanning 10 languages and 31 domains, reducing the Position Sensitivity Index in all 16 length-quartile x model x retrieval-setting combinations while preserving or improving aggregate nDCG@10. The extended codebase is available at https://github.com/impresso/fair-sentence-transformers.
Key takeaway
For Machine Learning Engineers optimizing dense retrieval systems, you should consider applying inference-time attention calibration to mitigate positional bias. A configuration of B=128, lambda=0.5, and 50% layer depth offers a robust starting point, improving nDCG@10 fairness without per-model tuning across diverse languages and domains. This approach enhances retrieval fairness while preserving or improving overall effectiveness, making it a valuable, low-cost optimization for your production systems.
Key insights
Inference-time attention calibration, especially partial application, effectively reduces positional bias in dense retrieval models without retraining.
Principles
- Positional bias degrades dense retrieval.
- Partial calibration often outperforms full.
- Inference-time adjustments improve fairness.
Method
Adapt inference-time attention calibration with a lambda coefficient. Tune basket size, calibrated layer set, and lambda to balance positional fairness and retrieval effectiveness.
In practice
- Apply B=128, lambda=0.5, 50% layer depth.
- Works for -pooled and last-token-pooled.
- Explore codebase at https://github.com/impresso/fair-sentence-transformers.
Topics
- Dense Information Retrieval
- Positional Bias
- Attention Calibration
- Inference Optimization
- Retrieval Fairness
- Sentence Transformers
Code references
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.