Poster
How to Merge Your Multimodal Models Over Time?
Sebastian Dziadzio · Vishaal Udandarao · Karsten Roth · Ameya Prabhu · Zeynep Akata · Samuel Albanie · Matthias Bethge
Model merging combines expert'' models---each finetuned from a shared foundation model on diverse tasks and domains---into a single, more capable base model. However, existing model merging approaches assume all experts to be available simultaneously.In reality, new tasks and domains emerge continuously, prompting the need for a dynamic process of integrating these experts over time, which we call \textit{temporal model merging}. The temporal dimension introduces unique challenges not addressed in prior work:At each task, should expert training start from merged previous experts or the original base model? Should all models be merged at every time step? Which merging techniques are best suited for temporal merging? Should different strategies be used for the training initialization and deployment phases? To tackle these questions, we propose a unified framework called \textsc{TIME}---\underline{T}emporal \underline{I}ntegration of \underline{M}odel \underline{E}xpertise---that defines temporal model merging across three axes: (1) Initialization Phase, (2) Deployment Phase, and (3) Merging Technique. Utilizing \textsc{TIME}, we study temporal model merging across model sizes, tasks, and compute budgets on the large-scale FoMo-in-Flux benchmark for continual multimodal pretraining. Systematic experiments across \textsc{TIME} and FoMo-in-Flux allow us to arrive at several crucial key insights for temporal model merging to better understand current limits and best practices for successful model merging across time.
Live content is unavailable. Log in and register to view live content