Reframing Long-Tailed Learning via Loss Landscape Geometry
Abstract
Balancing performance trade-off on long-tail data distributions remains a long-standing challenge. In this paper, we posit that this dilemma stems from a phenomenon called "catastrophic forgetting'' in continual learning (the model tends to severely overfit on head classes while quickly forgetting tail classes) and pose a solution from a loss landscape perspective. We observe that different classes possess divergent convergence points in the loss landscape. Besides, this divergence is aggravated when the model settles into sharp and non-robust minima, rather than a shared and flat solution that beneficial for all classes. In light of this, we propose a continual learning inspired framework to prevent "catastrophic forgetting''. To avoid inefficient per-class parameter preservation, a Grouped Knowledge Preservation module is proposed to memorize group-specific convergence parameters, promoting convergence towards a shared solution. Concurrently, our framework integrates a Grouped Sharpness Aware module to seek flatter minima by explicitly addressing the geometry of the loss landscape. Notably, our framework requires neither external training samples nor pre-trained models, facilitating the broad applicability. Extensive experiments on four benchmarks demonstrate significant performance gains over state-of-the-art methods.