Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment

2026-02-18 · Source: Apple Machine Learning Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, quick

Summary

A new unified framework addresses challenges in Query Auto-Completion (QAC) by reformulating it as end-to-end list generation. This approach integrates Retrieval-Augmented Generation (RAG) and multi-objective Direct Preference Optimization (DPO) to overcome limitations of traditional retrieve-and-rank pipelines, which struggle with long-tail coverage, and generative methods, which risk hallucination. Key innovations include multi-objective optimization for list generation, a comprehensive methodology combining RAG with learned and rule-based verifiers for synthetic data, and a hybrid serving architecture for efficient production deployment. Evaluated on a large-scale commercial search platform, the framework achieved significant improvements, including +0.40 to +0.69 preference scores in human evaluation, a 5.44% reduction in keystrokes, and a 3.46% increase in suggestion adoption in online experiments.

Key takeaway

For AI Engineers developing search and recommendation systems, this framework offers a production-validated method to enhance QAC. You should consider adopting a RAG-powered, end-to-end generative approach with multi-objective DPO to improve suggestion quality and user engagement, potentially reducing keystrokes by 5.44% and increasing adoption by 3.46% in your own systems.

Key insights

QAC can be effectively reframed as end-to-end list generation using RAG and multi-objective DPO.

Principles

Combine RAG with multi-objective DPO for QAC.
Use learned and rule-based verifiers for data quality.
Employ iterative critique-revision for synthetic data.

Method

Reformulate QAC as end-to-end list generation, integrating RAG, multi-objective DPO, and verifiers. Utilize iterative critique-revision for high-quality synthetic data generation, and deploy with a hybrid serving architecture.

In practice

Apply RAG to improve long-tail QAC coverage.
Implement DPO for multi-objective QAC alignment.
Design hybrid architectures for low-latency QAC.

Topics

Query Auto-Completion
Retrieval-Augmented Generation
Direct Preference Optimization
Multi-objective Optimization
End-to-End Generation

Best for: AI Engineer, NLP Engineer, AI Scientist, AI Researcher, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.