CMR-RD: Long-Tailed Adaptive VLM for Explainable CMR Diagnosis
Abstract
Cardiac magnetic resonance (CMR) is the clinical gold standard for assessing cardiovascular diseases, but its interpretation relies on expert experience and remains challenging, particularly for identifying rare diseases. Existing automated methods lack interpretable reasoning processes, limiting clinical adoption. Although vision-language models (VLMs) possess basic visual understanding and text generation capabilities, they still lack verifiable reasoning chains in medical diagnosis and underperform on minority classes in long-tail distributions. To address these challenges, we propose CMR-RD, to our knowledge the first VLM for interpretable diagnosis in CMR, capable of generating explicit diagnostic chains aligned with imaging evidence. We construct a CMR dataset that reflects real-world clinical distributions, comprising five disease categories (including two rare conditions) plus normal controls. Building on this, the general-purpose VLM is aligned to medical and CMR semantics using large-scale medical vision–text data, and cold-start training is used to enhance its understanding of medical concepts and basic reasoning. To enhance reasoning and performance on rare samples, we propose Group Phase Policy Optimization (GPPO), which combines online multi-stage reinforcement learning (RL)with adaptive sampling. GPPO enables the model to proactively explore rare and underperforming classes, thereby effectively mitigating long-tail bias. Experiments demonstrate that CMR-RD achieves state-of-the-art accuracy and reasoning-chain correctness compared with medical and general VLM baselines, shows stronger recognition of rare categories, and exhibits higher data efficiency. These results provide an interpretable pathway for automated CMR diagnosis.