Logit-Margin Repulsion for Backdoor Defense
Zhiguo Yang ⋅ Dongsheng Xu ⋅ Ruizhi Zhong ⋅ Jiacheng Pi ⋅ Xingxing Huang ⋅ Wenjie Ruan
Abstract
Backdoor attacks are an increasingly significant threat to deep neural networks. Recent studies have revealed that model compression, such as quantization and pruning, can be exploited to implant conditional backdoors. These backdoors remain dormant in full-precision models but are activated during the compression, making them highly stealthy and difficult to detect. Traditional defense methods are generally ineffective against such attacks, and defenses designed for conditional backdoors struggle to handle traditional ones. Moreover, most existing approaches fail to generalize to Transformer architectures.To address these challenges, we propose $\textit{$\textbf{L}$ogit $\textbf{M}$argin $\textbf{R}$epulsion}$ (LMR), a universal and architecture-agnostic defense method. LMR uses a small set of clean samples, combining selective cross-entropy with a logit-margin constraint to enlarge the gap between the backdoor class and benign classes. It then applies selective pruning to remove channels associated with backdoor behavior, achieving strong defense without changing the model architecture. Extensive experiments on a wide range of CNNs and Vision Transformers demonstrate that LMR, even with a minimal amount of clean data (0.1\%), can effectively mitigate both traditional and conditional backdoor attacks across diverse model architectures.
Successful Page Load