Skip to yearly menu bar Skip to main content


Poster

Unbiased General Annotated Dataset Generation

Dengyang Jiang · Haoyu Wang · Lei Zhang · Wei Wei · Guang Dai · Mengmeng Wang · Jingdong Wang · Yanning Zhang


Abstract:

Pre-training backbone networks on a general annotated dataset (e.g., ImageNet) that comprises numerous manually collected images with category annotations, have proven to be indispensable for enhancing the generalization capacity of downstream visual tasks. However, those manually collected images often exhibit non-trivial bias, which is not only non-transferable across either categories or domains, but also inevitably memorized by the backbone, thus causing its generalization capacity degeneration. To mitigate this problem, we present an \textbf{u}n\textbf{b}iased general annotated dataset \textbf{gen}eration framework (\textbf{ubGen}). Instead of expensive manual collection, we aim at directly generating synthetic unbiased images with category annotations. To achieve this goal, we propose to leverage the advantage of multimodal foundation model (e.g., CLIP), in terms of aligning images with language in an unbiased semantic space. Specifically, we develop a bi-level semantic alignment loss, which not only forces all generated images to be consistent with the semantic distribution of all categories belonging to the target dataset in an adversarial learning manner, but also requires each generated image to match the semantic description of its category name. In addition, we further cast an existing image quality scoring model into an quality assurance loss to preserve the quality of the generated image. By leveraging these two loss functions, we can obtain an unbiased image generation model by simply fine-tuning a pre-trained diffusion model using only all category names in the target dataset as input. Experimental results confirm that, compared with the manually labeled dataset or other synthetic datasets, the utilization of our generated unbiased datasets leads to stable generalization capacity enhancement of different backbone networks across various tasks, especially in tasks where the manually labeled samples are scarce.

Live content is unavailable. Log in and register to view live content