Poster Fri, Jun 5, 2026 • 3:00 PM – 5:00 PM PDT ExHall A & F 580

FM-Steer: Enhance Generalist Policies with Value-Guided Cascaded Denoising

Haoming Song ⋅ Delin Qu ⋅ Yuanqi Yao ⋅ Qizhi Chen ⋅ Jiarui Li ⋅ Qi Lv ⋅ Yiwen Tang ⋅ Li Kang ⋅ Heng Zhou ⋅ Xianqiang Gao ⋅ Yuhang Tang ⋅ Xiaofan Li ⋅ Modi Shi ⋅ Guangrui Ren ⋅ Maoqing Yao ⋅ Bin Zhao ⋅ Dong Wang ⋅ Xuelong Li

Paper PDF

Abstract

Humans naturally allocate more time before performing actual actions when handling complex tasks in the physical world. This paradigm, recently, has achieved remarkable advancement in boosting Large Language Models (LLMs) to solve complex tasks in digital domains.However, the potential of test-time computing remains largely unexplored for robotic foundation models interacting with the physical world.In this work, we propose \textbf{\ours}: a test-time computing framework that augments flow-based Vision-Language-Action (VLA) generalist policies with value-guided sampling and cascaded action denoising, enabling higher control performance and real-time action rates for dexterous robot manipulation.\ours first incorporates a flow-based intermediate verifier to estimate state–action values for candidate actions. At test time, the policy iteratively samples multiple noisy action proposals and retains the one with the highest predicted value, yielding value-aligned, high-quality actions without retraining.To satisfy the stringent frequency demands of robot control, \ours further introduces cascaded action denoising, decoupling expensive value-guided sampling from fast action refinement. A lightweight flow denoiser asynchronously takes the selected high-value noisy action and rapidly denoises it to produce the final control signal, enabling fluid, high-rate execution.During deployment, the intermediate verifier operates at a low frequency to provide value-guided sampling, while the lite-flow denoiser continually processes selected candidates to maintain real-time control.Extensive experiments demonstrate that \ours scales flow-based VLA models effectively at test time, and achieves state-of-the-art performance across diverse simulation benchmarks and real-world dexterous robotic tasks.