Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity
Zhengyao Fang ⋅ Zexi Jia ⋅ Yijia Zhong ⋅ Pengcheng Luo ⋅ Jinchao Zhang ⋅ Guangming Lu ⋅ Jun Yu ⋅ Wenjie Pei
Abstract
Recent advances in text-to-image (T2I) generation have greatly improved visual quality, yet producing images that appear visually authentic to real-world photography remains challenging. This is partly due to biases in existing evaluation paradigms: human ratings and preference-trained metrics often favor visually vivid images with exaggerated saturation and contrast, which make generations often $\textit{too vivid to be real}$ even when prompted for realistic-style images.To address this issue, we present $\textbf{Color Fidelity Dataset (CFD)}$ and $\textbf{Color Fidelity Metric (CFM)}$ for objective evaluation of color fidelity in realistic-style generations. CFD contains over 1.3M real and synthetic images with ordered levels of color realism, while CFM employs a multimodal encoder to learn perceptual color fidelity. In addition, we propose a training-free $\textbf{Color Fidelity Refinement (CFR)}$ that adaptively modulates spatial–temporal guidance scale in generation, thereby enhancing color authenticity.Together, CFD supports CFM for assessment, whose learned attention further guides CFR to refine T2I fidelity, forming a progressive framework for assessing and improving color fidelity in realistic-style T2I generation. All datasets and code will be publicly released.
Successful Page Load