Skip to yearly menu bar Skip to main content


Poster

From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training

Donglai Xu ⋅ Hongzheng Yang ⋅ Yuzhi Zhao ⋅ Pingping Zhang ⋅ Jinpeng Chen ⋅ Wenao Ma ⋅ Zhijian Hou ⋅ Mengyang Wu ⋅ Xiaolei Li ⋅ Senkang Hu ⋅ Ziyi Guan ⋅ Jason Chun Lok Li ⋅ Lai-Man Po

Abstract

Log in and register to view live content