Reasoning for Mobile User Experience with Multimodal LLMs: Task, Benchmark, and Approach
Ruichao Mao, Zhou Fang, Teng Guo, Hao Yang, Yaping Li, Shaohua Peng, Maji Huang, Xiaoyu Lin, Shuoyang Liu, Xuepeng Li, Yuyu Zhang, Hai Rao
Keywords:
Vision, Language, and Reasoning
Successful Page Load