Linear Image Generation by Synthesizing Exposure Brackets
Abstract
The life of a photo begins with photons striking the sensor, whose signals are passed through a sophisticated image signal processing (ISP) pipeline to produce a display-referred image. However, such images are no longer faithful to the incident light, being compressed in dynamic range and stylized by subjective preferences. In contrast, RAW images record direct sensor signals before non-linear tone mapping. After camera response curve correction and demosaicing, they can be converted into linear images, which are scene-referred representations that directly reflect true irradiance and are invariant to sensor-specific factors. Since image sensors have better dynamic range and bit depth, linear images contain richer information than display-referred ones, leaving users more room for editing during post-processing. Despite this advantage, current generative models mainly synthesize display-referred images, which inherently limits downstream editing. Generating linear images, however, is quite challenging. Pre-trained VAEs in latent diffusion models struggle to reconstruct linear images due to their higher dynamic range and bit depth, where extreme highlights and shadows cannot be simultaneously preserved. To this end, we represent a linear image as a sequence of exposure brackets—linear sub-images, each capturing a specific portion of the overall dynamic range. Based on this representation, we propose a new DiT-based flow-matching architecture to generate exposure brackets, which can be post-processed to produce a high-quality linear image. We further demonstrate that our approach enables downstream applications such as linear image editing and conditional linear image generation through ControlNet guidance.