IPR-1: Interactive Physical Reasoner
Abstract
Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can similarly acquire human-like reasoning from interaction and keep improving with more experience.We study this in a Game-to-Unseen (G2U) setting, curating 1{,}000+ heterogeneous games with diverse physical and causal mechanisms, and evaluate at three human-like levels: Survival, Curiosity, Utility, from primitive intuition to goal-driven reasoning.Our analysis reveals complementary failures: VLM/VLA agents reason but lack look-ahead in interactive settings, while world models imagine but imitate visual patterns rather than analyze physics and causality.We therefore propose \textbf{IPR} (\textbf{Interactive Physical Reasoning}), using world-model rollouts to score and reinforce a VLM’s policy, and introduce \textbf{PhysCode}, a physics-centric action code aligning semantic intent with dynamics to provide a shared action space for prediction and reasoning.Pretrained on 1,000+ games, our IPR performs robustly on three levels, matches GPT-5 overall, and surpasses it on Curiosity. We find that performance improves with more training games and interaction steps, and that the model also zero-shot transfers to unseen games.These results support physics-centric interaction as a path to steadily improving physical reasoning. \textbf{Our code will be publicly available.}