RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering

2026-06-01 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

RASER (Recoverability-Aware Selective Escalation Router) is a new system designed to optimize token costs in multi-hop question-answering (QA) by selectively escalating retrieval. Traditional multi-hop QA often incurs high LLM token expenses due to repeated calls for question decomposition or iterative retrieval, despite many questions being solvable by a single one-shot RAG. RASER addresses this by using one-shot RAG and six features to decide whether to stop or escalate to more complex retrieval actions like PRUNE or iterative retrieval (IRCoT), without requiring additional LLM calls for the decision itself. Across six LLMs and three multi-hop QA benchmarks, RASER achieves F1 scores competitive with state-of-the-art baselines while reducing token consumption to only 41-49% of an "always-prune" strategy and less than other iterative or decomposition baselines.

Key takeaway

For NLP Engineers building multi-hop question-answering systems, you should consider implementing a selective escalation router like RASER. This approach significantly reduces LLM token costs by identifying questions solvable with one-shot RAG, escalating only when necessary. You can achieve competitive F1 scores while cutting token expenditure by over 50% compared to always-prune strategies, optimizing your operational budget and improving system efficiency.

Key insights

RASER reduces multi-hop QA token costs by selectively escalating retrieval based on one-shot RAG features.

Principles

Many multi-hop questions are solvable by one-shot RAG.
Selective escalation optimizes LLM budget use.
Explicit cost-accuracy trade-offs improve retrieval efficiency.

Method

RASER uses one-shot RAG and six features to decide between stopping, PRUNE, or iterative retrieval (IRCoT), without extra LLM calls for decision-making.

In practice

Integrate RASER to optimize LLM token usage in multi-hop QA.
Analyze one-shot RAG performance to identify questions needing escalation.

Topics

Multi-hop QA
Retrieval-Augmented Generation
LLM Cost Optimization
Token Efficiency
RASER
Iterative Retrieval

Best for: AI Engineer, Machine Learning Engineer, CTO, AI Scientist, NLP Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.