Dual-Level Confidence based Implicit Self-Refinement for Medical Visual Question Answering
Abstract
Medical Visual Question Answering models often face potential train-test distribution shifts that hinder generalization across unseen imaging and linguistic patterns. To address this challenge, we propose a dual-level confidence based framework (DuCoR) that achieves implicit self-refinement through iterative pseudo-supervised optimization. Instead of relying on fixed pseudo answers, the model progressively refines its predictions by estimating their reliability from two complementary perspectives. A loss-level confidence captures the reliability of supervision by modeling clean and noisy loss distributions, while a feature-level confidence measures the semantic coherence between sample representations and their pseudo-answer conditioned prototypes. Since these two confidences originate from distinct information sources, including the supervision signal and the input semantics, they provide mutually corrective cues. They are adaptively fused to derive per-sample reliability weights that guide pseudo-supervised optimization toward better alignment with the target distribution. Extensive experiments on multiple Med-VQA benchmarks show that our method achieves superior performance and exhibits improved cross-domain generalization over fully supervised baseline.