RMAE-ProGRess: Advancing Semantic Segmentation in Unstructured Environments
Abstract
Semantic segmentation in unstructured environments presents unique challenges due to irregular terrain, occlusions, and complex spatial layouts. While structured settings (e.g., urban scenes) have been widely studied, segmentation in unstructured settings remains relatively underexplored, both in terms of standardized benchmarking and architectural design. In this work, we propose a encoder-decoder based semantic segmentation architecture that integrates a Reduced Masked Autoencoder (RMAE) as the encoder, a Feature-to-Pyramid (F2P) neck, and a novel decoder called ProGRess. The ProGRess decoder introduces Progressive Leapwise Fusion (PLF) for top-down multi-scale fusion of non-contiguous feature maps, a Lightweight Channel Attention gate with Residuals (LCAR) module, and a Bottleneck Feature Fusion (BFF) block for compact refinement. We establish comprehensive baselines by benchmarking state-of-the-art CNN and transformer-based models on challenging unstructured environment datasets viz. RELLIS-3D, it's coarse-grained variant, and RUGD. Our architecture achieves the state-of-the-art performance with 57.41\% mIoU on RELLIS-3D, 45.63\% mIoU on RUGD, 78.95\% mIoU on RELLIS-3DC datasets while maintaining competitive parameter-count and vRAM usage.