H2-Surv: Hierarchical Hyperbolic Multimodal Representation Learning for Survival Prediction
Abstract
Cancer survival prediction through multimodal learning that combines histopathology images with genomic data represents a promising research direction. However, current approaches still suffer from two key limitations. First, most methods operate in a Euclidean feature space, which makes it difficult to capture the intrinsic hierarchies in histopathology, where information is organized from patches to whole-slide images to patients, and in genomics, where it progresses from genes to pathways to patients. Second, they typically discretize survival times into coarse risk intervals, neglecting fine-grained ordinal relationships among samples within the same interval and thus failing to capture the continuous ranking characteristics of survival outcomes.To address these issues, we propose \ourmethod, a hyperbolic hierarchical multimodal learning framework for survival prediction. H2-SurvNet first employs a hyperbolic hierarchical information modeling (H2IM) module that maps multimodal features into a shared hyperbolic space and explicitly encodes intra-modal and inter-modal hierarchies across patches, WSIs, patients, genes, and pathways. On top of this representation, we design a Temporal Ordinal Contrastive learning (TOCL) module that models the temporal progression of survival outcomes by enforcing ordinal risk ordering through contrastive objectives, thereby promoting continuity in the learned risk scores.Extensive experiments on heterogeneous cohorts from TCGA, CPTAC, and NLST demonstrate that H2-SurvNet consistently outperforms state-of-the-art multimodal survival prediction methods and exhibits strong robustness and generalization across diverse data distributions. Source code will be released upon acceptance.