Tutorial

Efficient Text-to-Image/Video modeling

Srikumar Ramalingam

2025 Tutorial

Project Page

Abstract

We are witnessing groundbreaking results in image-to-text and image-to-video models. However, the generation process with these models is iterative and computationally expensive. There is a growing need to make these algorithms faster for serving millions of users efficiently. This course focuses on techniques such as progressive parallel decoding, distillation methods, and Markov Random Fields to accelerate text-to-image and text-to-video models. The course also critiques popular evaluation techniques like FID and introduces efficient alternatives such as CMMD.

Chat is not available.