Progressive Guessing to Fixed Point: Rethinking Human Motion Prediction with Deep Equilibrium Models
Abstract
Many recent human motion prediction methods adopt a multi-stage refinement framework, where each stage produces an initial guess of future poses for the next stage. These guesses are progressively refined towards the target prediction through a sequence of spatial-temporal reasoning stages.However, such a cascaded design incurs large computation and memory overheads that grow at least linearly with network depth, and lack an explicit stopping criteria.In this paper, we propose MotionDEQ, a deep equilibrium motion predictor that reformulates progressive guessing paradigm as a fixed point problem within an implicit layer. This formulation is conceptually equivalent to performing infinitely many refinement steps, but requires only O(1) training memory and can be solved efficiently through any black-box solvers. We carefully design this implicit refinement process by integrating Euclidean geometric transformations into equilibrium learning, allowing the entire network to be equivariant. We also find DEQs naturally fit the real-world scenario where motion data comes streamingly: the converged fixed point can be reused as a warm initial guess, to help recycle the redundant inference computation when making subsequent predictions.Our experiments demonstrate that MotionDEQ achieves the state-of-the-art prediction performances with superior memory efficiency, using fewer than 300K parameters with 55.3mm prediction error at 400ms on the Human3.6M dataset.