Poster
OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation
Xiao Cui · Yulei Qin · Wengang Zhou · Hongsheng Li · Houqiang Li
The demands for increasingly large-scale datasets pose substantial storage and computation challenges to building deep learning models.Dataset distillation methods,especially those via sample generation techniques,rise in response to condensing large original datasets into small synthetic ones while preserving critical information.Existing subset synthesis methods simply minimize the homogeneous distance where uniform contributions from all real instances are allocated to shaping each synthetic sample.We demonstrate that such equal allocation fails to consider the instance-level relationship between each real-synthetic pair and gives rise to insufficient modeling of geometric structural nuances between the distilled and original sets.In this paper,we propose a novel framework named OPTICAL to reformulate the homogeneous distance minimization into a bi-level optimization problem via matching-and-approximating.In the matching step,we leverage optimal transport matrix to dynamically allocate contributions from real instances.Subsequently,we polish the generated samples in accordance with the established allocation scheme for approximating the real ones.Such a strategy better measures intricate geometric characteristics and handles intra-class variations for high fidelity of data distillation.Extensive experiments across seven datasets and three model architectures demonstrate our method's versatility and effectiveness.Its plug-and-play characteristic makes it compatible with a wide range of distillation frameworks.Codes are available at https://anonymous.4open.science/r/CVPR2025_696.
Live content is unavailable. Log in and register to view live content