VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs
Naishan Zheng, Qingpei Guo, Jie Huang, Feng Zhao
Keywords:
Multimodal Learning
Successful Page Load