OntoAug: Rethinking Generative Data Augmentation via Ontology Guidance
Abstract
Generative data augmentation techniques open new avenues for improving image recognition models. The core of image recognition lies in accurately capturing the ontological features of the subject. However, existing methods often treat the image as a whole during augmentation, ignoring the uneven semantic distribution between foreground and background. This can lead to semantic shifts in generated samples, weakening the model’s ability to represent the subject’s ontology. In human perception, category recognition typically relies on the stable essence of the subject while tolerating variations in background and environment. Inspired by this human perceptual mechanism of “stable subjects, diverse backgrounds, and overall coherence,” we propose OntoAug, a data augmentation framework based on the distinction between ontology and environment that redefines the boundary of ontology-oriented enhancement. OntoAug explicitly separates the foreground subject and background context, guiding diffusion models through structured layout control to generate samples with consistent subjects and diverse backgrounds. Experiments show that OntoAug significantly improves performance in image classification, few-shot learning, weakly supervised object localization (WSOL), and large vision-language model (LVLM) reasoning, demonstrating its advantages in semantic fidelity and sample diversity. It offers a new direction for building visual systems more aligned with human perception. Code will be available.