FBTA: Enabling Single-GPU End-to-End Gigapixel WSI Classification with Feature Bridging and Translation Alignment
Abstract
Whole-slide images (WSIs) in computational pathology contain billions of pixels, making end-to-end training of feature extractors and multi-instance learning (MIL) networks infeasible on a single commodity GPU.Existing methods often freeze the feature extractor and train MIL networks on the resulting frozen features, which introduces a semantic gap that limits downstream performance. To address this issue, we propose FBTA, a Feature Bridging and Translation Alignment framework for WSI classification. FBTA is the first end-to-end MIL framework trainable on a single 24\,GB GPU, leveraging three complementary feature-bag views: end-to-end features enable joint optimization, frozen features stabilize training, and translated features support practical inference.Experiments on diverse datasets, including TCGA-NSCLC (Shot20/50/100) and TCGA-STAD, demonstrate the effectiveness and generality of FBTA, which consistently improves performance across three MIL architectures and two extractors. For example, with ResNet-50 as the extractor, FBTA improves the accuracy of the classic ABMIL by 13.1\% and 15.8\% on the NSCLC-Shot50 and TCGA-STAD datasets, respectively, and further enhances the state-of-the-art MambaMIL by 4.1\% and 9.2\% on the same datasets. Moreover, FBTA yields additional gains for MIL models that incorporate self-supervised pretraining strategies and data augmentation techniques.These results suggest FBTA is a feasible and scalable framework for end-to-end MIL on gigapixel WSIs. The code will be available.