Defending Unauthorized Model Merging via Dual-Stage Weight Protection
Abstract
Traditional multi-task learning often relies on separately fine-tuned models for each task, leading to high training costs and inefficiency. Recent advances in model merging alleviate this issue by linearly combining parameters from multiple task-specific models to create new multi-task models. Such approaches can match or even surpass fine-tuning performance while greatly reducing computational overhead. However, the increasing openness of model-sharing platforms also introduces intellectual property risks. Malicious users can easily merge publicly available models to build new commercial systems without authorization, undermining the rights of original developers. To address this emerging threat, we propose MergeGuard, a two-stage preprocessing mechanism that protects models against unauthorized merging. MergeGuard subtly adjusts a model’s internal parameter structure to maintain its original task performance while degrading the performance of any merged derivatives. The key challenge lies in defending against unpredictable merging behaviors, as the attacker’s chosen models, strategies, and tasks remain unknown. MergeGuard effectively achieves this balance—ensuring normal functionality before merging but significant performance degradation afterward—to safeguard model ownership in open AI ecosystems.