Paper
in
Workshop: Test-time Scaling for Computer Vision

Scaling Test-Time Compute Can Outperform Larger Architectures in Computer Vision

Erfan Darzi · Dylan Nguyen · George Cheng

Abstract

Deep neural networks face a fundamental trade-off between computational efficiency and accuracy. This paper introduces a method for network depth optimization that enables flexible inference with adjustable computational budgets while potentially improving training dynamics. Our approach partitions each residual stage into core and gated sub-paths, employing depth-aware training to develop networks that can operate at varying depths. We present theoretical analysis of our method through three key results: (1) an explicit regularization theorem quantifying how our training approach may penalize discrepancies between network configurations, (2) a statistical convergence theorem suggesting tighter generalization bounds based on effective network depth, and (3) a gradient dynamics theorem characterizing the noise properties induced by our training procedure. Empirically, our method shows improvements over conventional approaches on standard benchmarks, achieving favorable accuracy-efficiency trade-offs with a single trained model. The Gated Depth architecture provides a framework for deploying deep networks across diverse computational environments.

Chat is not available.