TaskIT: Memory-Efficient Fine-Tuning of Multi-LoRA LLMs via Cross-Task Importance Transfer
Abstract
On-device AI systems increasingly adopt a single foundation model equipped with task-specific Low-Rank Adaptation (LoRA) modules, forming a multi-LoRA LLM that supports multiple tasks.We study how to adapt such a model to a new task on memory-constrainted devices.Although LoRA reduces trainable parameters, fine-tuning a full set of modules remains memory-intensive.To improve efficiency, we apply sparse updating, training a subset of LoRA modules within the memory budget.However, existing sparse updating methods assume all candidate parameters are instantiated and cannot estimate the importance of modules that do not yet exist, while prior memory models designed for sequential networks fail to capture the blockwise parallel structure of Transformers.We propose TaskIT, a framework for memory-efficient fine-tuning via cross-task importance transfer. TaskIT predicts pre-insertion module importance by transferring from previously tuned tasks and employs a block-based memory predictor that captures parallel and sequential dependencies of Transformer blocks. A dynamic programming scheduler then selects module locations, numbers, and ranks to maximize accuracy within the memory budget.Experiments on uni-modal and cross-modal benchmarks show that TaskIT achieves superior accuracy-memory tradeoffs compared with Zero-FT, non-LoRA, and LoRA-based fine-tuning methods.