Poster Sun, Jun 7, 2026 • 2:30 PM – 4:30 PM PDT ExHall A 174

MU-GeNeRF: Multi-view Uncertainty-guided Generalizable Neural Radiance Fields for Distractor-aware Scene

wenjie mu ⋅ Zhan Li ⋅ Chuanzhou su ⋅ XUANYI SHEN ⋅ Ziniu Liu ⋅ Fan Lu ⋅ Yujian Mo ⋅ Junqiao Zhao ⋅ Tiantian Feng ⋅ chen ye ⋅ Guang Chen

Project Page

Abstract

Generalizable Neural Radiance Fields (GeNeRF) enable high-quality scene reconstruction from a limited number of views and can generalize to unseen scenes. However, in real-world environments, transient distractors disrupt structural consistency across views, leading to deviated supervision signals and degraded reconstruction quality. Existing distractor-free NeRF methods rely on per-scene optimization and they estimate uncertainty from per-view reconstruction errors to remove distractors, but this is unreliable to GeNeRF, because it may misjudge inconsistent static structures from source views as distractors. To address this issue, we propose MUGeNeRF: a multi-view uncertainty-guided distractor-aware GeNeRF method, aim to effectively alleviate GeNeRF's robust modeling challenges in dynamic scenes with transient distractions. We explicitly decompose distractor awareness into two complementary uncertainty modeling tasks: Source-view uncertainty, serving as a transferable prior during the feed-forward process, captures structural inconsistencies across source views caused by viewpoint changes or dynamic factors; Target-view uncertainty focuses on observation anomalies caused by transient changes to infer distractor spatial distribution. These two uncertainties are integrated into a heteroscedastic reconstruction loss that guides adaptive supervision weighting, boosting the model's capability to detect and suppress distractors, and enabling more robust geometric modeling. To our knowledge, this is the first attempt to explore GeNeRF modeling with in scenes with transient distractors. Extensive experiments demonstrate that our method not only outperforms existing GeNeRF approaches but also rivals the performance of scene-specific distractor-free NeRFs.