PrivSynth: Alternating and Control-Based Optimization for Privacy and Utility in Synthetic Data
Abstract
As publicly available data dwindles, synthetic data generation (SDG) has become a practical solution for privacy-preserving data sharing. By training generative models on private data, SDG creates samples that retain task-relevant features while obfuscating sensitive content. However, recent work shows that synthetic data can still leak private information via membership inference and reconstruction attacks. Existing defenses often degrade downstream utility. To address the privacy-utility trade-off, we formulate SDG as a bi-objective optimization problem. Yet, intractable gradients and expensive subset evaluation pose major challenges. We address this via alternate optimization over the generative model and data selection parameter, and further recast the selection step as a discrete-time optimal control problem, solved using Pontryagin’s Maximum Principle. We propose PrivSynth, a framework that quantifies multiple privacy risks and integrates it into the control objective. Theoretical analysis guarantees convergence, and experiments on benchmark and medical datasets show that PrivSynth achieves better utility and stronger privacy protection than state-of-the-art methods.