HySeg: Learning Generative Priors for Structure-Aware Remote Sensing Segmentation
Abstract
High-resolution remote sensing imagery exhibits complex spatial regularities where topology, continuity, and region adjacency govern semantic organization. However, existing remote sensing image semantic segmentation (RSISS) networks, being predominantly discriminative, estimate strong posteriors from data while lacking generative priors that encode such structural dependencies. This imbalance leads to fragmented boundaries, texture overfitting, and poor cross-domain generalization. We address this challenge by reformulating RSISS as posterior inference grounded in generative structural priors, introducing {\bf HySeg}, a hybrid generative–discriminative segmentation paradigm that learns structure-consistent priors through generative modeling and guides posterior inference for remote sensing segmentation. At its core, the MeanStruct module, a MeanFlow-based generative prior learner, models semantic topology as a continuous stochastic field, while the Prior-to-Affinity Projection (P2A) dynamically transforms this field into topology-aware, class-conditional affinities that guide posterior inference in the Dynamic Affinity-driven Segmentation (DAS) head. Our approach is model-agnostic and seamlessly integrates with diverse backbones, consistently improving structural coherence and generalization. Across four challenging RSISS benchmarks, HySeg achieves state-of-the-art performance and advances remote sensing segmentation from appearance-based perception to structural reasoning. All code and models will be released upon publication.