Factorized Context Aggregation for Robust Cancer Risk Estimation via Soft Re-Ranked Retrieval and Hierarchical Anchors
Abstract
Accurate cancer risk assessment is critical for personalized treatment planning. While multimodal models that integrate histopathology with complementary data modalities (e.g., genomics, or clinical reports) exhibit superior prognostic capability, they typically assume full data availability, an unrealistic expectation in real-world clinical settings. In contrast, histopathology slides are routinely collected, universally accessible, and information-rich, making them a practical anchor for robust survival prediction.In this study, we propose a novel framework that leverages histopathology as a basis for outcome prediction, while using other data modalities when training the models.Extensive experiments across eight cancer types and scenarios, including various data modalities, demonstrate that our model outperforms all baselines, with up to 8\% gains over methods that solely use histopathology at training time, and a 1.4\% gap compared to models that utilize all data modalities. Our model also stratifies patients into meaningful risk groups in 67\% of risk stratification scenarios (vs. 50\% for best SOTA), generalizes well under varying modality missingness, and matches the best SOTA even with 40\% higher rate of missing data during training. It also preserves semantic alignment in zero-shot settings.These results highlight the practical utility and robustness of our approach for real-world cancer risk prediction in resource-limited or modality-incomplete settings.