Poster
Beyond Clean Training Data: A Versatile and Model-Agnostic Framework for Out-of-Distribution Detection with Contaminated Training Data
Yuchuan Li · Jae-Mo Kang · Il-Min Kim
In real-world AI applications, training datasets are often contaminated, containing a mix of in-distribution (ID) and out-of-distribution (OOD) samples without labels. This contamination poses a significant challenge for developing and training OOD detection models, as nearly all existing methods assume access to a clean training dataset of only ID samples—a condition rarely met in real-world scenarios. Customizing each existing OOD detection method to handle such contamination is impractical, given the vast number of diverse methods designed for clean data. To address this issue, we propose a universal, model-agnostic framework that integrates with nearly all existing OOD detection methods, enabling training on contaminated datasets while achieving high OOD detection accuracy on test datasets. Additionally, our framework provides an accurate estimation of the unknown proportion of OOD samples within the training dataset—an important and distinct challenge in its own right. Our approach introduces a novel dynamic weighting function and transition mechanism within an iterative training structure, enabling both reliable estimation of the OOD sample proportion of the training data and precise OOD detection on test data. Extensive evaluations across diverse datasets, including ImageNet-1k, demonstrate that our framework accurately estimates OOD sample proportions of training data and substantially enhances OOD detection accuracy on test data. Code is provided as supplementary material.
Live content is unavailable. Log in and register to view live content