Skip to yearly menu bar Skip to main content


Poster

WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models

Fu Feng · Yucheng Xie · Jing Wang · Xin Geng


Abstract: The growing complexity of model parameters underscores the significance of pre-trained models; however, the constraints encountered during deployment necessitate the models of variable sizes. Consequently, the traditional pre-training and fine-tuning paradigm fails to address initialization problems when target models are incompatible with pre-trained ones. We tackle this issue from a multitasking perspective and introduce \textbf{WAVE}, which incorporates a set of shared \textbf{W}eight templates for \textbf{A}daptive initialization of \textbf{V}ariable-siz\textbf{E}d Models. During initialization, target models will initialize the corresponding weight scalers tailored to their sizes, which are sufficient to learn the connection rules of weight templates based on the Kronecker Product from a limited amount of data.To construct the weight templates, WAVE utilizes the \textit{Learngene} framework, which condenses the size-agnostic knowledge from ancestry models into the weight templates as the learngenes through knowledge distillation. This process structures the pre-trained models' knowledge according to the rules of weight templates. We provide a comprehensive benchmark for learngenes, with extensive experiments demonstrating WAVE’s efficiency. Results show that WAVE achieves state-of-the-art performance when initializing models of various depth and width. With only 10 epochs of training, n variable-sized models initialized by WAVE outperform directly pre-trained models trained over 150 epochs, particularly for smaller models, achieving approximately 15n× savings in computational resources.WAVE simultaneously achieves the most efficient knowledge transfer across a series of datasets, with an average improvement of 1.8\% and 1.2\% on 7 downstream datasets.

Live content is unavailable. Log in and register to view live content