Stabilizing Feature Geometry in Noisy Pretrained Models for Robust Downstream Tasks
Abstract
Pretraining on large-scale data followed by fine-tuning has become a standard paradigm for visual models. However, noise in the pretraining data can be absorbed by the model and carried into downstream tasks, causing catastrophic inheritance, where inherited pretraining noise reduces downstream generalization. Prior studies mainly link this issue to changes in the feature spectrum, arguing that noise reduces the strength of key feature components. Following this view, they aim to improve transferability by amplifying these components. However, these approaches focus only on spectral energy and implicitly assume that the feature directions remain fixed, which does not hold in practice. In this work, we revisit this view and reveal an overlooked effect: even mild pretraining noise can cause a clear rotation of the dominant feature subspace, despite negligible spectral energy degradation. To quantitatively characterize this phenomenon, we propose using the Principal Directional Angle (PDA) to measure the directional shift between the clean and noisy models. Building on this observation, we introduce the Feature Geometry Stabilization (FGS) framework, which aims to counteract the subspace rotation revealed by PDA by enhancing the geometric stability of the feature space through the synergistic interaction of perturbation consistency, variance-activation regularization, and feature consistency distillation. Experiments across multiple visual benchmarks demonstrate the effectiveness of FGS and verify the importance of stabilizing feature geometry to mitigate catastrophic inheritance.