MASS: Motion-Aware Spatial–temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
Xiyang Wu, Zongxia Li, Jihui Jin, Gouthaman KV, Vishnu Raj, Nilotpal Sinha, Jingxi Chen, Fan Du, Dinesh Manocha
Keywords:
Vision, Language, and Reasoning
Successful Page Load