Generative Retrieval via Diffusion Transformer with Metric-Ordered Sequence Training and Hybrid-Policy Preference Optimization

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

MO-DiT+HPPO is a novel framework for pattern-preserving attribute retrieval, a task where the goal is to find items that both satisfy a target attribute and adhere to a fine-grained pattern defined by a seed set. This approach addresses the inherent conflict between preserving the pattern and achieving high attribute density. The framework utilizes continuous generative retrieval, where a model processes item embeddings to generate query embeddings for nearest-neighbor search. MO-DiT+HPPO operates in stages: raw-sequence pretraining, multi-domain metric-ordered continuation pretraining, tail-centroid fine-tuning, and Hybrid-Policy Preference Optimization (HPPO). Metric-ordered training converts sparse online retrieval labels into in-pattern trajectories, guiding the model towards attribute improvement. HPPO further refines the generated query distribution using a hybrid candidate pool and reference-anchored preference optimization, incorporating a Pareto pair filter to enhance attribute metrics without sacrificing pattern purity. Experiments across four attribute domains demonstrate that MO-DiT+HPPO significantly improves the intersection metric over a pretrained generative retriever, showing gains on seven of eight domain-split cells.

Key takeaway

For Machine Learning Engineers building advanced retrieval systems, MO-DiT+HPPO offers a robust solution for pattern-preserving attribute retrieval. If you need to find items that match a specific pattern while also satisfying a target attribute, consider implementing its staged approach, particularly the metric-ordered training and HPPO. This framework significantly improves intersection metrics, helping you deliver more relevant results in complex production settings where both pattern fidelity and attribute density are critical.

Key insights

MO-DiT+HPPO resolves pattern-attribute conflicts in generative retrieval via metric-ordered learning and optimized query distribution.

Principles

Method

MO-DiT+HPPO stages include raw-sequence pretraining, multi-domain metric-ordered continuation pretraining, tail-centroid fine-tuning, and Hybrid-Policy Preference Optimization (HPPO) with a Pareto pair filter.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.