Unlocking Pre-trained Weights: Parameter Inheritance for Zero-Shot Initialization
Abstract
Appropriate parameter initialization is crucial for reducing the training cost of deep neural networks. Graph HyperNetworks (GHN) have emerged as a promising approach for initializing diverse architectures, with recent methods such as Task-Aware Learngene (TAL) further attempting to leverage pre-trained model knowledge via soft label supervision. However, such indirect supervision fails to fully exploit the rich information encoded in pre-trained weights. We propose Parameter InheriTance HyperNetwork (PITH), which introduces a novel parameter projection mechanism to directly inherit parameters from pre-trained models for initializing target networks of varying configurations. Our method enables initialized networks to directly achieve competitive performance on downstream tasks without any further training, which we term zero-shot initialization. Extensive experiments demonstrate the superiority of PITH: ViT-Base initialized by PITH achieves 53.35\% zero-shot accuracy on ImageNet-1K, surpassing the previous state-of-the-art by 6.54\%, with consistent improvements across multiple downstream tasks.