Generative Retrieval via Diffusion Transformer with Metric-Ordered Sequence Training and Hybrid-Policy Preference Optimization

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

A new framework, MO-DiT+HPPO, addresses the challenge of pattern-preserving attribute retrieval in continuous generative retrieval systems. This task requires finding items that satisfy a target attribute while maintaining a fine-grained pattern from a seed set, a goal where traditional embedding-based methods often fail. MO-DiT+HPPO is a staged framework comprising raw-sequence pretraining, multi-domain metric-ordered continuation pretraining, tail-centroid fine-tuning, and Hybrid-Policy Preference Optimization (HPPO). Metric-ordered training converts sparse online retrieval labels into in-pattern trajectories, teaching the model to improve metrics across domains. HPPO then aligns the generated query distribution with the true online objective using a hybrid candidate pool and reference-anchored preference optimization, incorporating a Pareto pair filter to enhance attribute metrics without compromising pattern purity. Evaluations across four attribute domains demonstrate that MO-DiT improves the intersection metric over existing generative retrievers, with HPPO providing further significant gains on seven of eight domain-split cells.

Key takeaway

For Machine Learning Engineers developing generative retrieval systems, if you are struggling to balance pattern preservation with attribute density, consider implementing the MO-DiT+HPPO framework. This approach offers a structured way to improve intersection metrics by leveraging metric-ordered sequence training and hybrid-policy preference optimization. You should explore integrating its Pareto pair filter to ensure attribute gains do not compromise the desired pattern purity in your retrieval results.

Key insights

A staged generative retrieval framework improves pattern-preserving attribute retrieval by combining metric-ordered training and hybrid preference optimization.

Principles

Method

MO-DiT+HPPO uses raw-sequence pretraining, multi-domain metric-ordered continuation pretraining, tail-centroid fine-tuning, and Hybrid-Policy Preference Optimization (HPPO) with a Pareto pair filter for generative retrieval.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.