Skip to yearly menu bar Skip to main content


Poster

In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification

Jinseong Park · Yujin Choi · Jaewook Lee


Abstract: To alleviate the utility degradation of deep learning image classification with differential privacy (DP), employing extra public data or pre-trained models has been widely explored. Recently, the use of in-distribution public data has been investigated, where a tiny subset of data owners share their data publicly. In this paper, we investigate a framework that leverages recent diffusion models to amplify the information of public data. Subsequently, we identify data diversity and generalization gap between public and private data as critical factors addressing the limited size of public data. While assuming 4\% of training data as public, our method achieves 85.48\% on CIFAR-10 without using pre-trained models, with a privacy budget of $(2,10^{-5})$.

Live content is unavailable. Log in and register to view live content