Unsupervised Multi-Scale Segmentation of 3D Subcellular World with Stable Diffusion Foundation Model
Abstract
We introduce an unsupervised approach for segmenting multiscale subcellular objects in 3D volumetric cryo-electron tomography (cryo-ET) images. To this end, we address key challenges such as lack of annotated data, large data volumes, high heterogeneity of subcellular shapes and sizes, and high inter-domain variability of cellular cryo-ET images across different experiments and contexts. Our method requires users to only select a small number of slabs from a few representative tomograms in the dataset. The core of our method is extracting features for the corresponding slabs, leveraging a Stable Diffusion foundation model pretrained on mostly natural images. The feature extraction is followed by a novel heuristic-based feature aggregation strategy, and adaptive thresholding to segment the aggregated features. The resulting masks are refined with pretrained CellPose to split composite regions, and then utilized as pseudo-ground truth for training supervised deep learning models. We validated our unsupervised foundation-model based pipeline on publicly available cryo-ET benchmark datasets, demonstrating performance that closely approximates expert human annotations. This fully automated, data-driven framework enables the mining of multi-scale subcellular patterns, paving the way for accelerated biological discoveries from large-scale cellular cryo-ET datasets.