FAIL: Flow Matching Adversarial Imitation Learning for Image Generation

2026-02-12 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Advanced, quick

Summary

The paper introduces Flow Matching Adversarial Imitation Learning (FAIL), a novel framework for post-training flow matching models by aligning their output distribution with a high-quality target, mathematically equivalent to imitation learning. Unlike Supervised Fine-Tuning, FAIL addresses policy drift without requiring costly preference pairs or explicit reward modeling. It minimizes policy-expert divergence through adversarial training, offering two algorithms: FAIL-PD, which uses differentiable ODE solvers for low-variance pathwise gradients, and FAIL-PG, a black-box alternative for discrete or computationally constrained environments. When fine-tuning FLUX with 13,000 demonstrations from Nano Banana pro, FAIL achieves competitive performance on prompt following and aesthetic benchmarks. The framework also generalizes to discrete image and video generation and acts as a robust regularizer against reward hacking in reward-based optimization.

Key takeaway

For research scientists developing or fine-tuning generative models, FAIL offers a robust method to improve model alignment without the overhead of preference pairs or explicit reward models. You should consider integrating FAIL-PD for continuous settings or FAIL-PG for discrete or resource-limited scenarios to enhance performance on prompt following and aesthetic benchmarks, while also leveraging its regularization capabilities to prevent reward hacking.

Key insights

FAIL aligns flow matching models with target distributions via adversarial imitation learning, bypassing explicit rewards.

Principles

Post-training flow matching equals imitation learning.
Adversarial training can minimize policy-expert divergence.

Method

FAIL uses adversarial training to minimize policy-expert divergence, with FAIL-PD for pathwise gradients via differentiable ODE solvers and FAIL-PG for black-box or discrete settings.

In practice

Fine-tune FLUX models with 13,000 demonstrations.
Apply to discrete image and video generation.
Mitigate reward hacking in reward-based optimization.

Topics

Flow Matching
Adversarial Imitation Learning
Image Generation
Video Generation
Reward Hacking

Code references

HansPolo113/FAIL

Best for: Computer Vision Engineer, Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.