Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models
Xinpeng Dong, Min Zhang, Kairong Han, Xu Tan, Fei Wu, Kun Kuang
Keywords:
Multimodal Learning
Successful Page Load