Skip to yearly menu bar Skip to main content


Poster

FedCS: Coreset Selection for Federated Learning

Chenhe Hao · Weiying Xie · Daixun Li · Haonan Qin · Hangyu Ye · Leyuan Fang · Yunsong Li


Abstract: Federated Learning (FL) is an emerging direction in distributed machine learning that enables jointly training a model without sharing the data. However, as the size of datasets grows exponentially, computational costs of FL increase. In this paper, we propose the first Coreset Selection criterion for Federated Learning (FedCS) by exploring the Distance Contrast (DC) in feature space. Our FedCS is inspired by the discovery that DC can indicate the intrinsic properties inherent to samples regardless of the networks. Based on the observation, we develop a method that is mathematically formulated to prune samples with high DC. The principle behind our pruning is that high DC samples either contain less information or represent rare extreme cases, thus removal of them can enhance the aggregation performance. Besides, we experimentally show that samples with low DC usually contain substantial information and reflect the common features of samples within their classes, such that they are suitable for constructing coreset. With only two time of linear-logarithmic complexity operation, FedCS leads to significant improvements over the methods using whole dataset in terms of computational costs, with similar accuracies. For example, on the CIFAR-10 dataset with Dirichlet coefficient α=0.1, FedCS achieves 58.88% accuracy using only 44% of the entire dataset, whereas other methods require twice the data volume as FedCS for same performance.

Live content is unavailable. Log in and register to view live content