Generative Retrieval via Diffusion Transformer with Metric-Ordered Sequence Training and Hybrid-Policy Preference Optimization
Summary
MO-DiT+HPPO is a novel framework for pattern-preserving attribute retrieval, a task where the goal is to find items that both satisfy a target attribute and adhere to a fine-grained pattern defined by a seed set. This approach addresses the inherent conflict between preserving the pattern and achieving high attribute density. The framework utilizes continuous generative retrieval, where a model processes item embeddings to generate query embeddings for nearest-neighbor search. MO-DiT+HPPO operates in stages: raw-sequence pretraining, multi-domain metric-ordered continuation pretraining, tail-centroid fine-tuning, and Hybrid-Policy Preference Optimization (HPPO). Metric-ordered training converts sparse online retrieval labels into in-pattern trajectories, guiding the model towards attribute improvement. HPPO further refines the generated query distribution using a hybrid candidate pool and reference-anchored preference optimization, incorporating a Pareto pair filter to enhance attribute metrics without sacrificing pattern purity. Experiments across four attribute domains demonstrate that MO-DiT+HPPO significantly improves the intersection metric over a pretrained generative retriever, showing gains on seven of eight domain-split cells.
Key takeaway
For Machine Learning Engineers building advanced retrieval systems, MO-DiT+HPPO offers a robust solution for pattern-preserving attribute retrieval. If you need to find items that match a specific pattern while also satisfying a target attribute, consider implementing its staged approach, particularly the metric-ordered training and HPPO. This framework significantly improves intersection metrics, helping you deliver more relevant results in complex production settings where both pattern fidelity and attribute density are critical.
Key insights
MO-DiT+HPPO resolves pattern-attribute conflicts in generative retrieval via metric-ordered learning and optimized query distribution.
Principles
- Pattern-preserving attribute retrieval balances pattern fidelity and attribute density.
- Metric-ordered training teaches attribute improvement directions across domains.
- Hybrid-Policy Preference Optimization aligns generated queries with online objectives.
Method
MO-DiT+HPPO stages include raw-sequence pretraining, multi-domain metric-ordered continuation pretraining, tail-centroid fine-tuning, and Hybrid-Policy Preference Optimization (HPPO) with a Pareto pair filter.
In practice
- Apply generative retrieval for fine-grained pattern-preserving item discovery.
- Use metric-ordered training to guide models toward desired attribute improvements.
- Implement HPPO to align generated queries with real-world online metrics.
Topics
- Generative Retrieval
- Diffusion Transformer
- Preference Optimization
- Nearest-Neighbor Search
- Attribute Retrieval
- Machine Learning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.