Multi-View Hierarchical Alignment Learning for Spatial Transcriptomics
Abstract
Spatial transcriptomics provides both spatial coordinates and gene expression profiles, enabling the study of tissue organization and cellular heterogeneity. Despite recent progress, current spatial clustering methods still face two major limitations. First, representations learned from spatial and expression views often differ due to view-specific noise and incomplete structural information. Without enforcing sample-level cross-view consistency, embeddings from the two views may not correspond to the same biological identity, reducing discriminative capability. Second, existing approaches lack effective semantic-level supervision. Although node embeddings capture local neighborhood patterns, they do not explicitly reflect high-level semantic structures. Prototype-based modeling can provide such semantic abstraction, yet current methods seldom align prototypes with node representations, leading to weak semantic consistency. To overcome these issues, we propose a Multi-View Hierarchical Alignment Learning for Spatial Transcriptomics (MHAL). At the sample level, MHAL introduce positive sample alignment to enforce consistency between spatial and expression embeddings. At the semantic level, MHAL design prototype level contrastive learning, where prototypes act as semantic anchors and guide the formation of coherent cluster structures. Together, these two alignment mechanisms progressively ensure both local consistency and global semantic discrimination. Extensive experimental results demonstrate that the proposed hierarchical contrastive multi-view clustering method achieves competitive performance in spatial domain identification compared to other state-of-the-art methods.