Poster
A Unified, Resilient, and Explainable Adversarial Patch Detector
Vishesh Kumar · Akshay Agarwal
ExHall D Poster #406
Deep Neural Networks (DNNs), backbone architecture in `almost' every computer vision task, are vulnerable to adversarial attacks, particularly physical out-of-distribution (OOD) adversarial patches. Existing models often struggle with interpreting these attacks in ways that align with human visual perception. Our proposed AdvPatchXAI introduces a generalized, robust, and explainable defense algorithm specifically designed to defend DNNs against physical adversarial threats. AdvPatchXAI employs a novel patch decorrelation loss that reduces feature redundancy and enhances the distinctiveness of patch representations, enabling better generalization across unseen adversarial scenarios. It learns prototypical parts in a self-supervised fashion, enhancing interpretability and correlation with human vision. The model utilizes a sparse linear layer for classification, making the decision-making process globally interpretable through a set of learned prototypes and locally explainable by pinpointing relevant prototypes within an image. Our comprehensive evaluation shows that AdvPatchXAI not only closes the ``semantic'' gap between latent space and pixel space but also effectively handles unseen adversarial patches even perturbed with unseen corruptions, thereby significantly advancing DNN robustness in practical settings.