Poster

Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning

Sicong Shen · Yang Zhou · Bingzheng Wei · Eric Chang · Yan Xu

2024 Poster

Paper PDF [ Poster] [Paper PDF]

Abstract

Existing fine-tuning methods for computer vision tasks primarily focus on re-weighting the knowledge learned from the source domain during pre-training. They aim to retain beneficial knowledge for the target domain while suppressing unfavorable knowledge. During the pre-training and fine-tuning stages, there is a notable disparity in the data scale. Consequently, it is theoretically necessary to employ a model with reduced complexity to mitigate the potential structural risk. However, our empirical investigation in this paper reveals that models fine-tuned using existing methods still manifest a high level of model complexity inherited from the pre-training stage, leading to a suboptimal stability and generalization ability. This phenomenon indicates an issue that has been overlooked in fine-tuning: Structural Risk Minimization. To address this issue caused by data scale disparity during the fine-tuning stage, we propose a simple yet effective approach called Tuning Stable Rank Shrinkage (TSRS). TSRS mitigates the structural risk during the fine-tuning stage by constraining the noise sensitivity of the target model based on stable rank theories. Through extensive experiments, we demonstrate that incorporating TSRS into fine-tuning methods leads to improved generalization ability on various tasks, regardless of whether the neural networks are based on convolution or transformer architectures. Additionally, empirical analysis reveals that TSRS enhances the robustness, convexity, and smoothness of the loss landscapes in fine-tuned models. Code is available at https://github.com/WitGotFlg/TSRS.

Chat is not available.