Learning from Synthetic Data via Provenance-Based Input Gradient Guidance
Abstract
Training with synthetic data has become a standard strategy for improving robustness to distribution shifts. However, most existing approaches exploit synthetic samples only indirectly---for example, by enriching backgrounds, contexts, or negative examples---while providing no explicit signal about where the true target content resides.As a result, models can continue to rely on spurious correlations, which ultimately limit their robustness. In this work, we convert a basic but under-utilized provenance of synthetic data into explicit supervision: during synthesis, we know which pixels or elements originate from which source instances. We formalize this provenance as synthetic knowledge and propose a Synthetic Knowledge-Guided (SKG) training framework that uses it to shape gradients toward target regions and away from irrelevant ones via a Gradient Guide Loss. Our framework is generic and can be seamlessly integrated into diverse synthesis pipelines, including mixing-based synthesis and generative editing-based synthesis, without additional human annotations. Experiments on image classification, weakly supervised object localization, and weakly supervised spatio-temporal action localization show consistent gains over strong baseline methods. These results demonstrate that making provenance in synthetic data is an effective and widely applicable mechanism for mitigating shortcut learning and enhancing robustness.