Bootstrapping Multi-view Learning for Test-time Noisy Correspondence
Abstract
Multi-view learning fuses complementary views to improve perception, but real-world deployments often suffer from Test-time Noisy Correspondence (TNC)~—~cross-view misalignment caused by asynchronous sampling, transient network congestion, or other disturbances. Such misalignment introduces semantic inconsistency and significantly degrades performance. Existing remedies typically estimate view-specific reliability from clean, well-aligned training data and then extrapolate to noisy fusion at inference, resulting in a train-test task gap and reduced robustness against TNC. To bridge this gap, we propose \underline{\textbf{\textcolor{red}{B}}}ootstrapping \underline{\textbf{\textcolor{red}{M}}}ulti-view \underline{\textbf{\textcolor{red}{L}}}earning (BML)~—~a plug-and-play framework that explicitly learns to fuse under TNC. Specifically, BML performs in-place TNC bootstrapping to construct a controllable noise-augmented training set that simulates realistic correspondence distortion, thereby eliminating the task gap without external data. Unlike prior uncertainty-based approaches that model reliability in an unsupervised manner, BML presents a reveal-supervised paradigm, wherein a lightweight estimator jointly models intra-view predictive uncertainty (view quality) and inter-view prediction discrepancy (correspondence consistency) to produce calibrated reliability weights guided by both task objectives and bootstrapped supervision. Once deployed, these reliability weights directly modulate fusion, suppressing corrupted views while preserving informative ones. Across 11 benchmarks spanning diverse noise ratios, BML consistently outperforms state-of-the-art baselines and maintains robustness against TNC. Code will be released upon acceptance.