TM-BSN: Triangular-Masked Blind-Spot Network for Real-World Self-Supervised Image Denoising
Abstract
Blind-spot networks (BSNs) enable self-supervised image denoising by preventing access to the target pixel during training, allowing the network to estimate clean signals without ground-truth supervision.However, this approach assumes pixel-wise noise independence, which is violated in real-world sRGB images due to spatially correlated noise introduced by the camera's image signal processing (ISP) pipeline.While several methods employ downsampling strategies to decorrelate noise, these approaches alter noise statistics and limit the network's ability to utilize full contextual information.In this paper, we propose the Triangular-Masked Blind-Spot Network (TM-BSN), a novel blind-spot architecture that accurately models the spatial correlation of real sRGB noise.This correlation originates from the demosaicing process, where each pixel is reconstructed from neighboring samples with weights that decay spatially, resulting in a characteristic diamond-shaped pattern.To align the receptive field with this geometry, we introduce a triangular-masked convolution that restricts the kernel to its upper-triangular region, creating a diamond-shaped blind spot at the original image resolution.This design effectively excludes correlated pixels while fully leveraging uncorrelated contextual information, eliminating the need for downsampling or post-processing.Furthermore, we use knowledge distillation to transfer complementary knowledge from multiple blind-spot predictions into a lightweight U-Net, improving both accuracy and efficiency.Extensive experiments on real-world denoising benchmarks demonstrate that our method achieves state-of-the-art performance, significantly outperforming existing self-supervised approaches.