A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models
Duo Li, Zuhao Yang, Xiaoqin Zhang, Ling Shao, Shijian Lu
Keywords:
Efficient and Scalable Vision
Successful Page Load