Revisiting F-measure Optimization in Multi-Label Classification: A Sampling-based Approach
Zixun Wang
Abstract
The F-measure is a widely used metric in multi-label classification, where multiple labels are predicted simultaneously for a single instance. The optimal prediction rule for F-measure requires estimating $q^2+1$ probabilities, where $q$ is the number of labels. Existing approaches train $q$ multinomial estimators (multi-class classifiers) to directly estimate these probabilities, followed by a matrix multiplication for making predictions. However, this method has two major drawbacks. First, the matrix multiplication incurs a time complexity of $\mathcal{O}(q^3)$, which becomes computationally expensive for large $q$. Second, training multinomial estimators is challenging due to the sparsity of the underlying distributions, which results from the inherent imbalance in multi-label datasets and is further exacerbated by the label transformation required by the method itself. In this paper, we first demonstrate that matrix multiplication can be reformulated as a series of convolutions by exploiting a special structure in the matrix. These convolutions can then be efficiently computed using the Fast Fourier Transform (FFT), reducing the time complexity to $\mathcal{O}(q^2\log q)$. For example, on the \textit{COCO} dataset, matrix multiplication requires 27 seconds, while FFT takes only 1 seconds, resulting in a 27x speedup. To avoid multinomial label transformation, we propose an indirect sampling-then-estimation approach to estimate the required probabilities. This method trains only $q$ binary estimators instead of multinomial ones, thereby alleviating the sparsity issue, simplifying the training process, and improving performance. We provide theoretical guarantees for the consistency of the proposed sampling-based method and demonstrate its effectiveness through extensive experiments on diverse datasets.
Successful Page Load