Poster

nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark

Yanfeng Zhou · Lingrui Li · Le Lu · Minfeng Xu

2025 Poster

Paper PDF [ Poster]

Abstract

Semantic segmentation is a crucial prerequisite in clinical applications and computer-aided diagnosis. With the development of deep neural networks, biomedical image segmentation has achieved remarkable success. Encoder-decoder architectures that integrate convolutions and transformers are gaining attention for their potential to capture both global and local features. However, current designs face the contradiction that these two features cannot be continuously transmitted. In addition, some models lack a unified and standardized evaluation benchmark, leading to significant discrepancies in the experimental setup. In this study, we review and summarize these architectures and analyze their contradictions in design. We modify UNet and propose WNet to combine transformers and convolutions, addressing the transmission issue effectively. WNet captures long-range dependencies and local details simultaneously while ensuring their continuous transmission and multi-scale fusion. We integrate WNet into the nnUNet framework for unified benchmarking. Our model achieves state-of-the-art performance in biomedical image segmentation. Extensive experiments demonstrate their effectiveness on four 2D datasets (DRIVE, ISIC-2017, Kvasir-SEG, and CREMI) and four 3D datasets (Parse2022, AMOS22, BTCV, and ImageCAS). The code is available at https://github.com/XXX.

Chat is not available.