Generative Retrieval via Diffusion Transformer with Metric-Ordered Sequence Training and Hybrid-Policy Preference Optimization
Summary
A new framework, MO-DiT+HPPO, addresses the challenge of pattern-preserving attribute retrieval in continuous generative retrieval systems. This task requires finding items that satisfy a target attribute while maintaining a fine-grained pattern from a seed set, a goal where traditional embedding-based methods often fail. MO-DiT+HPPO is a staged framework comprising raw-sequence pretraining, multi-domain metric-ordered continuation pretraining, tail-centroid fine-tuning, and Hybrid-Policy Preference Optimization (HPPO). Metric-ordered training converts sparse online retrieval labels into in-pattern trajectories, teaching the model to improve metrics across domains. HPPO then aligns the generated query distribution with the true online objective using a hybrid candidate pool and reference-anchored preference optimization, incorporating a Pareto pair filter to enhance attribute metrics without compromising pattern purity. Evaluations across four attribute domains demonstrate that MO-DiT improves the intersection metric over existing generative retrievers, with HPPO providing further significant gains on seven of eight domain-split cells.
Key takeaway
For Machine Learning Engineers developing generative retrieval systems, if you are struggling to balance pattern preservation with attribute density, consider implementing the MO-DiT+HPPO framework. This approach offers a structured way to improve intersection metrics by leveraging metric-ordered sequence training and hybrid-policy preference optimization. You should explore integrating its Pareto pair filter to ensure attribute gains do not compromise the desired pattern purity in your retrieval results.
Key insights
A staged generative retrieval framework improves pattern-preserving attribute retrieval by combining metric-ordered training and hybrid preference optimization.
Principles
- Pattern-preserving attribute retrieval balances pattern purity and attribute density.
- Metric-ordered training can teach domain-agnostic metric improvement.
- Hybrid-Policy Preference Optimization aligns generated queries with online objectives.
Method
MO-DiT+HPPO uses raw-sequence pretraining, multi-domain metric-ordered continuation pretraining, tail-centroid fine-tuning, and Hybrid-Policy Preference Optimization (HPPO) with a Pareto pair filter for generative retrieval.
In practice
- Use Pareto pair filtering to balance attribute gain and pattern purity.
- Apply metric-ordered training for cross-domain metric improvement.
- Implement HPPO for online objective alignment in generative retrieval.
Topics
- Generative Retrieval
- Diffusion Transformers
- Preference Optimization
- Metric-Ordered Training
- Pattern Preservation
- Embedding Retrieval
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.