Unlocking Token Rewards via Training-Free Reward Attribution
WU Sitong ⋅ Haoru Tan ⋅ Bin Xia ⋅ Xichen Zhang ⋅ Jingyao Li ⋅ Shaofeng Zhang ⋅ Xiaojuan Qi ⋅ Bei Yu ⋅ Jiaya Jia
Abstract
In this paper, we propose an extremely efficient, training-free method to extract token-level reward signals directly from an existing deep reward model. Our core idea is to attribute the overall process reward to individual tokens by estimating each token's influence. This influence is defined as the change in the final macroscopic reward (e.g., the process reward) when a token is replaced with a semantically null token. Naively calculating this influence is computationally infeasible, requiring $N$ forward passes through the PRM for an $N$-token sequence. We overcome this bottleneck by proposing a highly efficient gradient-based estimator. Specifically, we use a first-order Taylor approximation, which simplifies the influence calculation to the inner product of the difference between the token embedding and the null token embedding, and the gradient of the reward with respect to the token embedding. This requires only a single forward and backward pass. The resulting token-level rewards enable standard RL algorithms to perform precise credit assignment without requiring additional reward model training. Experiments on challenging reasoning benchmarks demonstrate that our method substantially improves policy optimization efficiency and enhances the generalization of LLM reasoning capabilities. Our P2T outperforms the outcome reward by +4.9\% on MathVista for Qwen2.5-VL-7B-Instruct, and +11.5\% on AIME24 for Qwen2.5-Math-7B, while with a around 4$\times$ faster convergence.Our results underscore the importance of fine-grained reward shaping and provide a simple, plug-and-play solution to unlock token-level supervision from existing PRMs.
Successful Page Load