CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning
Abstract
Graphical User Interface (GUI) Agents, benefiting from recent advances in multimodal large language models (MLLM), have achieved significant development. However, due to the frequent updates of GUI applications, adapting to new tasks without forgetting old tasks in GUI continual learning remains an open problem. Existing works are generally trained on a fixed set of tasks and adapt to new tasks either through supervised fine-tuning (SFT) or reinforcement learning (RL), suffering from catastrophic forgetting and slow adaptation. In this work, we propose a \textbf{C}ontinual \textbf{G}UI \textbf{L}earning (CGL) framework that dynamically balances adaptation efficiency and skill retention by enhancing the synergy between SFT and RL. Specifically, we introduce an SFT proportion adjustment mechanism guided by policy entropy to dynamically control the weight allocation between the SFT and RL training phases. Additionally, we propose gradient surgery and entropy-regulated tuning strategies to enable GUI agents to continuously evolve while maintaining competence across previously learned domains. On top of that, we propose a AndroidControl-CL benchmark, which divides GUI applications into distinct task groups to effectively simulate and evaluate the performance of GUI continual learning. Experimental results demonstrate the effectiveness of our proposed CGL framework under the continual learning setting. The benchmark, code and model will be made publicly available.