Do Neural Retrievers Prefer Certain Documents? Evidence of Learned Relevance Priors

2026-06-01 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Information Retrieval · Depth: Expert, quick

Summary

Supervised bi-encoder neural retrievers implicitly learn a document-level relevance prior, a query-independent signal encoded in their representation space from annotated training data. Researchers estimated this prior by training simple classifiers on frozen document embeddings, evaluating three state-of-the-art retrievers across multiple IR benchmarks. Findings indicate these retrievers encode generalizable and consistent relevance priors, creating a "findability gap" where documents with lower priors are systematically harder to retrieve, even if relevant. This effect is weaker in BM25. LLM-based explanations reveal that judged-relevant documents tend to be comprehensive, self-contained summaries of mainstream topics, while niche or technical content is often unjudged. Retrievers internalize this bias, ranking favored features higher independently of actual relevance.

Key takeaway

For Information Retrieval engineers developing neural retrievers, you must account for learned relevance priors that bias retrieval towards mainstream, comprehensive documents. This bias can cause your systems to systematically overlook genuinely relevant niche or highly technical content. You should carefully examine your training data for such implicit preferences and consider augmenting retrieval systems with methods less susceptible to these priors to ensure comprehensive and unbiased search results.

Key insights

Supervised neural retrievers learn implicit document preferences from training data, creating a "findability gap" for certain document types.

Principles

Supervised neural retrievers encode query-independent relevance priors.
These priors lead to a "findability gap" for documents with lower prior scores.
Training data annotation protocols can introduce implicit document preferences.

Method

The prior was estimated by training simple classifiers on frozen document embeddings and evaluating three state-of-the-art retrievers across multiple IR benchmarks.

In practice

Analyze training data for implicit document preferences.
Evaluate retriever performance on diverse document types.
Consider alternative retrieval methods for niche content.

Topics

Neural Retrievers
Relevance Priors
Document Bias
Information Retrieval
Bi-encoder Models
Training Data Bias

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.