From Coarse to Precise: Rethinking and Bridging Localization in Multimodal Large Language Models
Lysa Xiao, Veronica Liesaputra, Lech Szymanski, Stephen Cranefield
Keywords:
Multimodal Learning
Successful Page Load