Skip to yearly menu bar Skip to main content


MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation

Xiaolong Deng · Huisi Wu · Runhao Zeng · Jing Qin

Arch 4A-E Poster #168
[ ] [ Project Page ]
Thu 20 Jun 10:30 a.m. PDT — noon PDT
Oral presentation: Orals 3C Medical and Physics-based vision
Thu 20 Jun 9 a.m. PDT — 10:30 a.m. PDT


We propose a novel echocardiographical video segmentation model by adapting SAM to medical videos to address some long-standing challenges in ultrasound video segmentation, including (1) massive speckle noise and artifacts, (2) extremely ambiguous boundaries, and (3) large variations of targeting objects across frames. The core technique of our model is a temporal-aware and noise-resilient prompting scheme. Specifically, we employ a space-time memory that contains both spatial and temporal information to prompt the segmentation of current frame, and thus we call the proposed model as MemSAM. In prompting, the memory carrying temporal cues sequentially prompt the video segmentation frame by frame. Meanwhile, as the memory prompt propagates high-level features, it avoids the issue of misidentification caused by mask propagation and improves representation consistency. To address the challenge of speckle noise, we further propose a memory reinforcement mechanism, which leverages predicted masks to improve the quality of the memory before storing it. We extensively evaluate our method on two public datasets and demonstrate state-of-the-art performance compared to existing models. Particularly, our model achieves comparable performance with fully supervised approaches with limited annotations. Codes are available at

Live content is unavailable. Log in and register to view live content