Poster
ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models
Fernando Julio Cendra · Kai Han
ExHall D Poster #260
The inherent ambiguity in the definition of visual concepts poses significant challenges for modern generative models, like the Text-to-Image (T2I) models based on diffusion models, in accurately learning concepts from the input images. Existing methods lack a systematic framework and interpretative mechanisms, hindering reliable extraction of the underlying intrinsic concepts. To address this challenge, we present ICE, short for Intrinsic Concept Extraction, a novel framework to automatically and systematically extract intrinsic concepts from a single image leveraging a T2I model. ICE consists of two pivotal stages. In the first stage, ICE devises an automatic concept localization module that pinpoints relevant text-based concepts and their corresponding masks within a given image. This critical phase not only streamlines concept initialization but also offers precise guidance for the subsequent analysis. The second stage delves deeper into each identified mask, decomposing concepts into intrinsic components, capturing specific visual characteristics and general components representing broader categories. This decomposition facilitates a more granular understanding by further dissecting concepts into detailed intrinsic attributes such as colour and material. Extensive experiments validate that ICE achieves superior performance on intrinsic concept extraction, enabling reliable and flexible application to downstream tasks like personalized image generation, image editing, and so on. Code and datasets will be made publicly available for research purposes.
Live content is unavailable. Log in and register to view live content