Temporal Inversion for Learning Interval Change in Chest X-Rays
Abstract
Recent advances in vision--language pretraining have enabled strong medical foundation models, yet most analyze radiographs in isolation, overlooking the key clinical task of comparing prior and current images to assess interval change. For chest radiographs (CXRs), capturing interval change is essential, as radiologists must evaluate not only the static appearance of findings but also how they evolve over time. We introduce TILA (Temporal Inversion-aware Learning and Alignment), a simple yet effective framework that uses temporal inversion---reversing image pairs---as a supervisory signal for temporal reasoning. TILA integrates inversion-aware objectives across pretraining, fine-tuning, and inference, complementing conventional appearance modeling with explicit learning of directional change. We also propose a unified evaluation protocol to assess order sensitivity and consistency under temporal inversion, and introduce MS-CXR-T_retrieval, a benchmark for progression-aware retrieval. Experiments on public datasets and real-world hospital cohorts demonstrate that TILA consistently improves progression classification and temporal embedding alignment across multiple architectures. Overall, temporal inversion provides a simple and general principle for building order-aware medical vision--language models and supports temporally robust reasoning.