Do Neural Retrievers Prefer Certain Documents? Evidence of Learned Relevance Priors

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Information Retrieval · Depth: Expert, quick

Summary

Supervised bi-encoder neural retrievers implicitly learn a document-level relevance prior, a query-independent signal encoded in their representation space from annotated training data. Researchers estimated this prior by training simple classifiers on frozen document embeddings, evaluating three state-of-the-art retrievers across multiple IR benchmarks. Findings indicate these retrievers encode generalizable and consistent relevance priors, creating a "findability gap" where documents with lower priors are systematically harder to retrieve, even if relevant. This effect is weaker in BM25. LLM-based explanations reveal that judged-relevant documents tend to be comprehensive, self-contained summaries of mainstream topics, while niche or technical content is often unjudged. Retrievers internalize this bias, ranking favored features higher independently of actual relevance.

Key takeaway

For Information Retrieval engineers developing neural retrievers, you must account for learned relevance priors that bias retrieval towards mainstream, comprehensive documents. This bias can cause your systems to systematically overlook genuinely relevant niche or highly technical content. You should carefully examine your training data for such implicit preferences and consider augmenting retrieval systems with methods less susceptible to these priors to ensure comprehensive and unbiased search results.

Key insights

Supervised neural retrievers learn implicit document preferences from training data, creating a "findability gap" for certain document types.

Principles

Method

The prior was estimated by training simple classifiers on frozen document embeddings and evaluating three state-of-the-art retrievers across multiple IR benchmarks.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.