Eliminate Distance Differences Induced by Backdoor Attacks: Layer-Selective Training and Clipping to Mask Backdoor Models
Abstract
Federated learning (FL) enables a central server to collaboratively train a global model with multiple clients while preserving data privacy. However, the distributed nature of FL makes the paradigm vulnerable to backdoor attacks, as proved by numerous recent studies. Although existing studies improve the effectiveness of backdoor attacks through optimized triggers, they have two limitations: (1) they ignore the heterogeneous contribution of individual model layers to the success of a backdoor; (2) they induce conspicuous differences between backdoor and clean models in the early stages of poisoning. The limitations cause backdoor models to exhibit significant discrepancies from clean models, making them easily detectable. To fill these gaps, we propose LaySelFL, a novel layer-selective method to eliminate distance differences induced by the backdoor to conceal attacks in FL. Our central insight is that different layers contribute unequally to backdoor attacks, by localizing poisoning to layers that are most sensitive to backdoor objectives, an attacker can reduce the model differences substantially between the backdoor and clean models. Concretely, LaySelFL identifies sensitive layers via both dynamic and static evaluations of parameter differences between backdoor and benign models, and then applies a targeted training protocol and a regularized loss that constrains differences from the global model in each round. Finally, LaySelFL performs clipping on non-poisoning layers to further mask residual differences introduced by the attack. This strategy yields a more covert and resilient backdoor attack. Extensive experiments show that LaySelFL increases the effectiveness of attacks by 25\% and reduces the effectiveness of defense methods to 4\%.