Predicting Spatial Transcriptomics from Histology Images via High-Order Multi-Cell Interaction Modeling
Abstract
Spatial transcriptomics (ST) links gene expression to tissue architecture and enables predicting spatial expression from H&E-stained whole-slide images (WSIs). However, existing spot- or slide-level predictors focus on single-spot features or pairwise relations, failing to capture high-order, many-to-many cross-cell interactions. As a result, they miss synergistic and antagonistic effects among multiple neighboring cells. Here, we introduce MCToGene, a scalable and accurate framework that explicitly models multi-cell interactions via many-body attention with hierarchical coupling to predict spatial gene expression. MCToGene employs a many-body attention module to encode high-order, many-to-many cross-cell dependencies, enabling context-aware microenvironment modeling. To mitigate the combinatorial burden of many-body modeling, we design a hierarchical interaction module that couples pairwise and many-body representations for feature aggregation and prediction, preserving many-body expressiveness while controlling computation and memory. On HEST-1k and STImage-1K4M, MCToGene surpasses state-of-the-art baselines with 7.85% relative improvement. Ablations confirm that explicit high-order, many-to-many modeling drives these gains, and visualizations demonstrate that multi-cell interactions is essential for biologically coherent spatial predictions.