GeoRK2: Geometry-Guided Runge–Kutta Integration for Diffusion Transformer Acceleration
Abstract
Diffusion transformer models deliver state-of-the-art image synthesis quality but suffer from prohibitively slow iterative sampling. Fewer sampling steps accelerate inference but inevitably distort intermediate features and degrade visual fidelity, while offering little relief in computational cost. To address these limitations, we present GeoRK2, a training-free framework that bridges numerical analysis and information geometry. GeoRK2 couples second-order Runge–Kutta (RK2) integration with a curvature-aware geometric flow derived from the model's noise predictions, establishing provably stable feature evolution dynamics under manifold-aware integration. By leveraging an empirical feature covariance–induced metric estimated from gradient covariances to capture intrinsic feature geometry and applying parallel transport along the manifold connection, GeoRK2 constrains error propagation under large-step integration, ensuring both numerical stability and structural fidelity. As a fully plug-and-play method, GeoRK2 requires no retraining and is compatible with mainstream pretrained diffusion transformers. Comprehensive experiments on image generation and super-resolution tasks across representative diffusion backbones (e.g., DiT-XL, HunyuanVideo, and FLUX.1-dev) demonstrate that GeoRK2 achieves 4–5× faster inference than baseline frameworks (FORA, TaylorSeer) with only marginal perceptual differences (∆FID ≈ 0.81), confirming its effectiveness and generality. All implementation details and code are provided in the supplementary material.