Poster
Enhancing Dataset Distillation via Non-Critical Region Refinement
Minh-Tuan Tran · Trung Le · Xuan-May Le · Thanh-Toan Do · Dinh Phung
Dataset distillation has gained popularity as a technique for compressing large datasets into smaller, more efficient representations while retaining essential information for model training. Data features can be broadly divided into two types: instance-specific features, which capture unique, fine-grained details of individual examples, and class-general features, which represent shared, broad patterns across a class. However, previous approaches often struggle to balance these; some focus solely on class-general features, missing finer instance details, while others concentrate on instance-specific features, overlooking the shared characteristics essential for class-level understanding. In this paper, we propose the Non-Critical Region Refinement Dataset Distillation (NRR-DD) method, which preserves the instance-specific and fine-grained regions in synthetic data while enriching non-critical regions with more class-general information. This approach enables our models to leverage all pixel information to capture both types of features, thereby improving overall performance. Furthermore, we introduce Distance-Based Representative (DBR) knowledge transfer, which eliminates the need for soft labels in training by relying solely on the distance between synthetic data predictions and one-hot encoded labels. Experimental results demonstrate that our NRR-DD achieves state-of-the-art performance on both small-scale and large-scale datasets. Additionally, by storing only two distances per instance, our method achieves comparable results across various settings. Code will be available at https://anonymous.4open.science/r/NRR-DD.
Live content is unavailable. Log in and register to view live content