Skip to yearly menu bar Skip to main content


Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding

Zhiheng Cheng · Qingyue Wei · Hongru Zhu · Yan Wang · Liangqiong Qu · Wei Shao · Yuyin Zhou

Arch 4A-E Poster #324
[ ]
Wed 19 Jun 10:30 a.m. PDT — noon PDT


The Segment Anything Model (SAM) has garnered significant attention for its versatile segmentation abilities and intuitive prompt-based interface. However, its application in medical imaging presents challenges, requiring either substantial training cost and extensive medical datasets for full model fine-tuning or high-quality prompts for optimal performance.This paper introduces H-SAM: a prompt-free adaptation of SAM designed for efficient fine-tuning on medical images via a two-stage hierarchical decoding procedure. In the first stage, H-SAM employs SAM's original decoder to create a prior (probabilistic) mask, which will be used to guide more intricate decoding in the second stage. Specifically, we propose two key designs: 1) A class-balanced, mask-guided self-attention mechanism that addresses the unbalanced label distribution and thus enhancing the image embedding; 2) A learnable mask cross-attention mechanism that spatially modulates the interplay among different image regions based on the prior mask. Moreover, the inclusion of a hierarchical pixel decoder in H-SAM enhances its proficiency in capturing fine-grained and localized details. This approach enables SAM to effectively integrate learned medical prior, facilitating enhanced adaptation for medical image segmentation with limited samples.Our H-SAM enjoys 4.78\% improvement in average Dice compared to existing prompt-free SAM variants for multi-organ segmentation using only 10\% 2D slices.Without using any unlabeled data at all, H-SAM is able to even outperform state-of-the-art semi-supervised models which use extensive unlabeled training data on various medical datasets.

Live content is unavailable. Log in and register to view live content