OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models
Abstract
The deployment of autonomous agents in Graphical User Interface (GUI) environments confronts significant challenges, notably error accumulation in long-horizon tasks and the severe consequences of irreversible operations. While critic models that provide real-time action assessment offer a promising solution, their effectiveness is hindered by the lack of diverse, high-quality GUI feedback data and public critic benchmarks for computer use.To bridge these gaps,we introduce OS-Oracle that makes three core contributions:(1) a scalable data pipeline for synthesizing cross-platform GUI critic data;(2) a two-stage training paradigm combining supervised fine-tuning (SFT) and consistency-preserving group relative policy optimization (CP-GRPO); (3) OS-Critic Bench, a holistic benchmark for evaluating critic model performance across Mobile, Web, and Desktop platforms.Leveraging this framework, we curate a high-quality dataset containing 310k critic samples. The resulting critic model, OS-Oracle-7B, achieves impressive performance,and further reduces error rates, which improves the capability of GUI agents in dynamic environments. All codes, data and checkpoints will be made public.