SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

SOAR (Self-Correction for Optimal Alignment and Refinement) is a novel post-training method for diffusion models that addresses "exposure bias," a mismatch between ground-truth training states and model-generated inference states. Unlike traditional supervised fine-tuning (SFT) which optimizes only on ideal states, or reinforcement learning (RL) which uses sparse terminal rewards, SOAR performs a single stop-gradient rollout to generate off-trajectory states. It then re-noises these states and supervises the model to correct back towards the original clean target, providing dense, reward-free, per-timestep supervision. Evaluated on SD3.5-Medium, SOAR improves GenEval from 0.70 to 0.78 and OCR from 0.64 to 0.67 over SFT, while also raising all model-based preference scores. In controlled experiments, SOAR surpassed Flow-GRPO on aesthetic and text-image alignment tasks without a reward model, demonstrating its effectiveness as a stronger first post-training stage.

Key takeaway

For AI Engineers and Research Scientists working with diffusion models, SOAR offers a robust alternative to traditional SFT, directly addressing exposure bias and improving generation quality across multiple metrics. You should consider integrating SOAR as the initial post-training phase to enhance model performance and stability, especially for tasks requiring high compositional accuracy and text rendering, before applying any targeted reward optimization.

Key insights

SOAR corrects diffusion model exposure bias by providing dense, on-policy, reward-free supervision for off-trajectory states.

Principles

Method

SOAR constructs off-trajectory states via a single stop-gradient ODE step, re-noises them to auxiliary levels, and supervises the model to steer back to the original clean target using an analytically derived correction objective.

In practice

Topics

Code references

Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.