GUI-SAGE: Enhancing GUI Automation with Self-Explanatory Learning
Abstract
Reinforcement learning with verifiable rewards (RLVR) has shown promise for GUI automation, enabling agents to learn from binary task completion signals. However, when task difficulty exceeds model capacity, on-policy exploration fails to discover correct actions, creating zero-advantage traps that eliminate learning signals. While incorporating off-policy expert demonstrations seems intuitive, it causes persistent high-entropy states due to distribution mismatch, disrupting effective learning. We propose GUI-SAGE, a self-explanation framework that generates in-distribution reasoning trajectories for GUI automation. By conditioning on ground-truth actions, our method produces in-distribution guidance that avoids the confusion caused by out-of-distribution expert demonstrations. We further introduce Entropy-Modulated Credit Assignment, which recalibrates learning weights by jointly considering prediction confidence and reward signals, enabling amplified updates for confident correct actions and attenuated updates for uncertain explorations. Extensive experiments on AndroidControl and GUI-Odyssey demonstrate that GUI-SAGE-3B achieves competitive performance with 81.1\% success rate, substantially outperforming existing methods. Our analysis validates that self-explanations maintain stable learning dynamics while expert demonstrations cause entropy collapse, and that entropy modulation provides the largest improvements on in-distribution samples.