Is BM25 Enough for Agentic Deep Research?, Replacing Decoders with MLPs in Generative Recommendation, and More!

2025-01-31 · Source: Top Information Retrieval Papers of the Week · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Robotics & Autonomous Systems · Depth: Expert, long

Summary

This week's intelligence brief highlights ten research papers and two benchmarks in information retrieval and recommender systems. Key advancements include PI-SERINI, a minimal search agent that pairs BM25 lexical retrieval with capable LLMs for deep research, achieving 83.1% accuracy on BrowseComp-Plus while cutting costs 3.3x to 10x. Jina AI introduced jina-embeddings-v5-omni, multimodal embedding models (0.95B "nano", 1.57B "small") that extend text-only backbones to images, video, and audio without retraining. Guo et al. presented SID-MLP, a distillation framework that replaces Transformer decoders with MLPs for generative recommendation, yielding an 8.74x throughput speedup and 95.7% peak-memory reduction. IBM Research released Granite Embedding Multilingual R2 models, supporting 200+ languages with a 32,768-token context window. New benchmarks, ASTRA-QA and TRIVIA+, address abstract question answering and LLM hallucination detection, respectively.

Key takeaway

For AI Architects evaluating retrieval-augmented generation (RAG) systems, consider that simple lexical retrieval (BM25) combined with capable LLMs can achieve high accuracy and significant cost reductions, challenging the necessity of complex dense retrievers. Your teams should investigate agentic search frameworks like PI-SERINI to optimize both performance and inference costs, especially when dealing with extensive document sets and aiming for efficient resource utilization.

Key insights

Lexical retrieval with LLMs can outperform dense retrievers while multimodal embeddings can extend frozen text backbones.

Principles

Simple heuristics can expose benchmark shortcuts.
Multimodal embeddings can be created without backbone retraining.
RAG failures stem from premature question-to-answer routing.

Method

PI-SERINI uses a minimal search agent with BM25, caching, and selective text pulling. Jina-embeddings-v5-omni bolts pretrained encoders onto a frozen text model, training only thin projector layers.

In practice

Tune BM25 for long, noisy documents (e.g., k1=25, b=1).
Use MLP heads to accelerate generative recommendation by 8.74x.
Employ program synthesis for multi-hop RAG to improve robustness.

Topics

Agentic Search
Generative Recommendation
Multimodal Embeddings
Retrieval-Augmented Generation
Efficient Embeddings

Code references

Best for: AI Architect, AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Top Information Retrieval Papers of the Week.