StyleDoctor: Towards Specialist Reward Model for Style-centric Generation Tasks
Abstract
Style generation has made significant progress through diffusion models. Recent efforts have explored reinforcement learning with human-preference reward models to enhance diffusion models for general downstream applications. However, we identify a critical limitation: existing human-preference reward models struggle to effectively perceive image style, resulting in suboptimal performance after reinforcement fine-tuning. To address this, we first introduce a large-scale style reward modeling dataset comprising 400K paired samples spanning 1,000 diverse style categories, augmented with textual instructions and style reward annotations.We then propose StyleDoctor, a novel style perception reward model capable of jointly evaluating style consistency between paired images and style-text alignment. StyleDoctor outperforms existing style perception models in both style retrieval and generation tasks. Extensive quantitative and qualitative experiments demonstrate the superiority of StyleDoctor over competing approaches, showcasing its efficiency and versatility in style-conditioned generation. Our dataset and code will be made public upon acceptance.