Debiased Sample Selection for Learning with Noisy Labels
Abstract
Existing methods for learning with noisy labels (LNL) predominantly rely on the small-loss trick, assuming that low-loss samples are more likely to be correctly labeled. While effective, this strategy suffers from two overlooked confirmation biases: (1) Class-level confirmation bias: samples from easy-to-learn classes tend to have lower losses, leading to over-selection of easy samples while ignore hard ones; (2) Instance-level confirmation bias: mislabeled samples with spuriously low loss are mistakenly treated as clean, forcing the model to memorize wrong labels. Both biases accumulate over training and degrade performance. To mitigate these issues, we propose Marginal Distribution Adjustment (MDA) and Candidate Class Selection (CCS). MDA dynamically reshapes the model’s predicted class distribution toward uniformity, ensuring more fair sample selection across classes. CCS leverages training dynamics to identify likely true labels and removes them from the classification task, preventing memorization of incorrect annotations while converting weakly related labels into useful supervision. Both MDA and CCS are plug-and-play modules. Extensive experiments show that integrating MDA and CCS into either existing sample selectors or advanced LNL pipeline consistently enhances performance on both CIFAR-10/100 with synthetic noise and real-world datasets (CIFAR-N, Clothing1M, WebVision), demonstrating their broad applicability in LNL methods. Our code will be publicly available.