Skip to yearly menu bar Skip to main content


In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification

Jinseong Park · Yujin Choi · Jaewook Lee

Arch 4A-E Poster #253
[ ] [ Project Page ]
Thu 20 Jun 10:30 a.m. PDT — noon PDT

Abstract: To alleviate the utility degradation of deep learning image classification with differential privacy (DP), employing extra public data or pre-trained models has been widely explored. Recently, the use of in-distribution public data has been investigated, where a tiny subset of data owners share their data publicly. In this paper, we investigate a framework that leverages recent diffusion models to amplify the information of public data. Subsequently, we identify data diversity and generalization gap between public and private data as critical factors addressing the limited size of public data. While assuming 4\% of training data as public, our method achieves 85.48\% on CIFAR-10 without using pre-trained models, with a privacy budget of $(2,10^{-5})$.

Live content is unavailable. Log in and register to view live content