Skip to yearly menu bar Skip to main content


Poster

Faster Parameter-Efficient Tuning with Token Redundancy Reduction

Kwonyoung Kim · Jungin Park · Jin Kim · Hyeongjun Kwon · Kwanghoon Sohn


Abstract:

Parameter-efficient tuning (PET) aims to transfer pre-trained foundation models to downstream tasks by learning a small number of parameters. In practice, PET requires much smaller storage and transmission cost for each task than traditional fine-tuning methods, which require updating whole parameters, regardless of exponentially increasing pre-trained model capacity. However, most existing PET methods inherit the latency associated with their large backbones and often require additional computation due to additional modules (e.g. adapter) during inference, making them less practical on computation-intensive applications. In this paper, we propose a Faster Parameter-Efficient Tuning (FPET) method to achieve high inference speed and computation efficiency while keeping storage efficiency high. Specifically, we introduce a plug-and-play token redundancy reduction module delicately engineered for PET. The proposed module refines tokens from the self-attention layer using an adapter to learn the accurate similarity between tokens and cuts off the token count through a token merging strategy. We formulate token merging to be fully differentiable using a straight-through estimator, making token redundancy reduction optimal. Experimental results prove that our FPET achieves faster inference and higher memory efficiency than the pre-trained backbone while keeping competitive performance on par with state-of-the-art PET methods.

Live content is unavailable. Log in and register to view live content