Skip to yearly menu bar Skip to main content


Poster

Variance-Based Membership Inference Attacks Against Large-Scale Image Captioning Models

Daniel Samira · Edan Habler · Yuval Elovici · Asaf Shabtai


Abstract:

The proliferation of multi-modal generative models has introduced new privacy and security challenges, especially due to the risks of memorization and unintentional disclosure of sensitive information. This paper focuses on the vulnerability of multi-modal image captioning models to membership inference attacks (MIAs). These models, which synthesize textual descriptions from visual content, could inadvertently reveal personal or proprietary data embedded in their training datasets. We explore the feasibility of MIAs in the context of such models. Specifically, our approach leverages a variance-based strategy tailored for image captioning models, utilizing only image data without knowing the corresponding caption. We introduce the means-of-variance threshold attack (MVTA) and confidence-based weakly supervised attack (C-WSA) based on the metric, means-of-variance (MV), to assess variability among vector embeddings. Our experiments demonstrate that these models are susceptible to MIAs, indicating substantial privacy risks. The effectiveness of our methods is validated through rigorous evaluations on these real-world models, confirming the practical implications of our findings.

Live content is unavailable. Log in and register to view live content