Poster

AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift Factor

Sudong Cai

2024 Poster

Project Page Paper PDF [ Slides] [ Poster] [Paper PDF]

Abstract

Nonlinearities are decisive in neural representation learning. Traditional Activation (Act) functions impose fixed inductive biases on neural networks with oriented biological intuitions. Recent methods leverage self-gated curves to compensate for the rigid traditional Act paradigms in fitting flexibility. However, substantial improvements are still impeded by the norm-induced mismatched feature re-calibrations (see Section 1), i.e., the actual importance of a feature can be inconsistent with its explicit intensity such that violates the basic intention of a direct self-gated feature re-weighting. To address this problem, we propose to learn discriminative neural feature Act with a novel prototype, namely, AdaShift, which enhances typical self-gated Act by incorporating an adaptive shift factor into the re-weighting function of Act. AdaShift casts dynamic translations on the inputs of a re-weighting function by exploiting comprehensive feature-filter context cues of different ranges in a simple yet effective manner. We obtain the new intuitions of AdaShift by rethinking the feature-filter relationships from a common Softmax-based classification and by generalizing the new observations to a common learning layer that encodes features with updatable filters. Our practical AdaShifts, built upon the new Act prototype, demonstrate significant improvements to the popular/SOTA Act functions on different vision benchmarks. By simply replacing ReLU with AdaShifts, ResNets can match advanced Transformer counterparts (e.g., ResNet-50 vs. Swin-T) with lower cost and fewer parameters.

Chat is not available.