Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models
Yuedong Yang ⋅ Xiwen Wei ⋅ Mustafa Munir ⋅ Radu Marculescu
Abstract
Reasoning Large Multi-modality Models (LMMs) have become the de facto choice for many applications.However, these models rely on a Chain-of-Thought (CoT) process that is lengthy and unpredictable at runtime, often resulting in inefficient use of computational resources (due to memory fragmentation) and sub-optimal accuracy (due to under- and over-thinking).We observe empirically that the CoT process follows a Bernoulli process, whose behavior is independent of the specific generated samples. This suggests that the CoT length can be estimated ahead of time based on a hidden parameter representing the amount of "fuel" available to support the reasoning process.Based on this insight, we propose **Fuel Gauge, the first method which extracts this hidden signal and predicts CoT length ahead of time**. We demonstrate the utility on the Fuel Gauge on two downstream tasks: predictive KV cache allocation, which addresses memory fragmentation in LMM serving systems, and CoT length modulation, which mitigates under-thinking and over-thinking.Extensive experiments on LMMs across text-only, image-text, and video-text question answering benchmarks demonstrate the effectiveness, generalizability, and practical value of our Fuel Gauge. For example, on the GPQA-Diamond benchmark, our Fuel Gauge achieves less than half the CoT length prediction error compared to the baseline; this translates into a 13.37$\times$ reduction in the memory allocation frequency.
Successful Page Load