Linear Image Generation by Synthesizing Exposure Brackets

2026-04-24 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

Researchers from S-Lab, Nanyang Technological University, Adobe NextCam, and Adobe Research have developed a novel generative framework for text-to-linear-image synthesis, addressing the limitations of current generative models that primarily produce display-referred images. Their method, which uses a DiT-based flow-matching architecture, represents a linear image as a sequence of four exposure brackets (EVs of -4, -2, 0, 2) to overcome the challenges of high dynamic range and bit depth that VAEs struggle with. The framework incorporates exposure modulation self-attention and a radiance-scale token denoising mechanism for joint radiance scale and image content prediction. Trained on a dataset of 25,000 RAW images from RAISE and Adobe FiveK, the model achieves superior visual quality and dynamic range compared to adapted state-of-the-art text-to-image and text-to-video models. This approach also enables downstream applications like linear image editing, inpainting, and ControlNet-guided conditional generation.

Key takeaway

For research scientists developing advanced image generation models, this work demonstrates a robust approach to synthesizing high-dynamic-range linear images. You should consider adopting multi-exposure bracket generation and radiance-scale token denoising to overcome VAE limitations when working with scene-referred data, enabling richer post-processing capabilities and more physically accurate outputs. This framework provides a strong foundation for future work in professional photography workflows and computational imaging.

Key insights

Generating linear images as exposure brackets overcomes VAE limitations for high dynamic range content.

Principles

Linear images offer superior post-processing flexibility.
Decomposing HDR into exposure brackets aids generative models.
Jointly predicting radiance scale improves scene reconstruction.

Method

The method uses a flow-matching framework with a DiT backbone to synthesize multiple exposure brackets, fusing them into a linear image. It integrates exposure modulation self-attention and a radiance-scale token denoising mechanism.

In practice

Use 3D-RoPE for multi-bracket positional encoding.
Employ LoRA for efficient fine-tuning of large models.
Apply exposure modulation only to Single-DiT components.

Topics

Linear Image Generation
Exposure Bracketing Synthesis
Flow-Matching Diffusion Models
Diffusion Transformers
Radiance Scale Prediction

Code references

black-forest-labs/flux

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.