Skip to yearly menu bar Skip to main content


Mean-Shift Feature Transformer

Takumi Kobayashi

Arch 4A-E Poster #120
[ ]
Wed 19 Jun 5 p.m. PDT — 6:30 p.m. PDT


Transformer models developed in NLP make a great impact on computer vision fields, producing promising performance on various tasks.While multi-head attention, a characteristic mechanism of the transformer, attracts keen research interest such as for reducing computation cost, we analyze the transformer model from a viewpoint of feature transformation based on a distribution of input feature tokens.The analysis inspires us to derive a novel transformation method from mean-shift update which is an effective gradient ascent to seek a local mode of distinctive representation on the token distribution.We also present an efficient projection approach to reduce parameter size of linear projections constituting the proposed multi-head feature transformation.In the experiments on ImageNet-1K dataset, the proposed methods are embedded into various network models to exhibit favorable performance improvement in place of the transformer module.

Live content is unavailable. Log in and register to view live content