SAT-RRG: LLM-Guided Self-Adaptive Training for Radiology Report Generation with Token-Level Push–Pull Optimization
YUNYI LIU ⋅ Yingshu Li ⋅ Tong Chen ⋅ Lingqiao Liu ⋅ Lei Wang ⋅ Luping Zhou
Abstract
Radiology report generators often produce fluent text yet miss crucial details, leading to local semantic conflicts or flipped findings that require stronger penalties. **Cross-entropy (CE) merely increases the probability of the ground-truth token $y^*$ without directly suppressing the model’s current wrong choice $\hat{y}$**, and treats all positions uniformly, so corrections are not prioritized. We introduce a **self-adaptive optimization framework** that dynamically adjusts token-level gradients based on semantic discrepancy cues derived from a frozen LLM referee. The LLM itself is not the contribution—it merely provides weak supervision to trigger the adaptive learning process. Within this framework, (i) semantic conflicts between the predicted and reference reports are **automatically localized** and tagged with ``... `` (used only during training), and (ii) **adaptive, stronger penalties** are applied within these sparse but critical spans. Updates follow a *push–pull* scheme: error spans are pushed down, while non-error tokens are reinforced. The update strength is governed by two complementary signals—*normalized entropy* (for uncertainty calibration) and *focal-style confidence* (for handling over- and under-confident predictions). On MIMIC-CXR and IU-Xray, our framework consistently improves both language metrics (BLEU-4, ROUGE-L, CIDEr) and clinical metrics (RadGraph F1, CheXbert), and remains robust to noisy or imperfect error tags.
Successful Page Load