Skip to yearly menu bar Skip to main content


Small Scale Data-Free Knowledge Distillation

He Liu · Yikai Wang · Huaping Liu · Fuchun Sun · Anbang Yao

Arch 4A-E Poster #116
[ ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


Data-free knowledge distillation is able to utilize the knowledge learned by a large teacher network to augment the training of a smaller student network without accessing the original training data, avoiding privacy, security and proprietary risks in real applications. Existing methods typically follow an inversion-and-distillation paradigm in which a generative adversarial network is trained by leveraging the pre-trained teacher network, and is used to synthesize a large-scale sample set for knowledge distillation. In this paper, we reexamine this common data-free knowledge distillation paradigm and show that there is considerable room to improve the overall training efficiency through a lens of small-scale data inversion for distillation. In light of several empirical observations indicating the importance of how to balance class distributions in terms of the synthetic sample diversity and difficulty during both data synthesis and distillation processes, we propose Small Scale Data-free Knowledge Distillation (SSD-KD). In formulation, SSD-KD introduces a modulating function to balance synthetic samples and a priority sampling function to select proper samples, facilitated by the dynamic replay buffer and reinforcement learning. As a result, SSD-KD can perform distillation training conditioned on an extremely small scale of synthetic samples, making the overall training efficiency an order of magnitude faster than current mainstream methods while retaining competitive model performance. Experiments on image classification and semantic segmentation benchmarks demonstrate the efficacy of our method. The code is provided for results reproduction.

Live content is unavailable. Log in and register to view live content