MERIT: Multi-domain Efficient RAW Image Translation
Abstract
RAW images captured by different camera sensors exhibit substantial domain shifts due to varying spectral responses, noise characteristics, and tone behaviors, complicating their direct use in downstream vision tasks. Prior methods address this by training one-to-one RAW-to-RAW translators for each source-target domain pair, but such approaches do not scale to real-world scenarios with multiple cameras. We introduce MERIT, the first unified framework for multi-domain RAW image translation, which leverages a single model to perform translations across arbitrary camera domains. To address domain-specific noise discrepancies, we propose a sensor-aware noise modeling loss that explicitly aligns the signal-dependent noise statistics of the generated images with those of the target domain. Additionally, we enhance the generator’s context modeling with a conditional multi-scale large kernel attention module, enabling efficient capture of both global illumination and fine-grained sensor cues. To support standardized evaluation, we construct MDRAW, a new dataset of paired and unpaired RAW images from five diverse camera sensors. Extensive experiments on existing and newly proposed benchmarks demonstrate that MERIT significantly outperforms prior models in both accuracy and scalability, offering a practical and generalizable solution to cross-domain RAW image harmonization.