The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Discriminator-Guided RL (DRL) is proposed to correct flow matching models, addressing a structural mismatch where standard matching losses poorly align with visual realism and coherent object structure at inference. DRL trains a discriminator within a pretrained representation space to differentiate real data from base-model samples, utilizing its logit as a reward for KL-regularized reinforcement learning. This approach sidesteps the need for expensive human preferences. Across SiT, JiT, REPA, and RAE backbones, DRL significantly reduces guidance-free FID (e.g., 9.38 to 2.62 on SiT) and semantic-space FD (e.g., 88.2 to 19.3 on DINOv3 for SiT). It also improves human-preference rewards without direct training and yields a better Pareto frontier for alignment and artifact reduction.

Key takeaway

For machine learning engineers developing generative models, especially those using flow matching, DRL offers a robust method to significantly enhance sample quality and realism. By leveraging a discriminator in a pretrained representation space, you can achieve superior image fidelity, reduce artifacts, and improve semantic coherence without relying on costly human preference data. Consider integrating DRL into your training pipeline to boost model performance and alignment.

Key insights

DRL corrects flow matching models by using a discriminator in a pretrained space to provide a data-aligned reward, overcoming L2 loss limitations.

Principles

Method

DRL trains a discriminator in a pretrained representation space to distinguish real data from model samples, then uses its logit as the reward for KL-regularized reinforcement learning.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.