Poster
TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance
Mushui Liu · Dong She · Qihan Huang · Jiacheng Ying · Wanggui He · Jingxuan Pang · Yuanlei Hou · Siming Fu
Subject-driven image personalization has seen notable advancements, especially with the advent of the ReferenceNet paradigm. ReferenceNet excels in integrating image reference features, making it highly applicable in creative and commercial settings. However, current implementations of ReferenceNet primarily operate as latent-level feature extractors, which limit their potential. This constraint hinders the provision of appropriate features to the denoising backbone across different timesteps, leading to suboptimal image consistency. In this paper, we revisit the extraction of reference features and propose TFCustom, a model framework designed to focus on reference image features at different temporal steps and frequency levels. Specifically, we firstly propose synchronized ReferenceNet to extract reference image features while simultaneously optimizing noise injection and denoising for the reference image. We also propose a time-aware frequency feature refinement module that leverages high- and low-frequency filters, combined with time embeddings, to adaptively select the degree of reference feature injection. Additionally, to enhance the similarity between reference objects and the generated image, we introduce a novel reward-based loss that encourages greater alignment between the reference and generated images. Experimental results demonstrate state-of-the-art performance in both multi-object and single-object reference generation, with significant improvements in texture and textual detail generation over existing methods.
Live content is unavailable. Log in and register to view live content