UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
Zhuangcheng Gu ⋅ Guang Liang ⋅ Bin Wang ⋅ Zhiyuan Zhao ⋅ Qintong Zhang ⋅ Weijia Li ⋅ Chao Xu ⋅ Bo Zhang ⋅ Botian Shi ⋅ Jiang Wu ⋅ Wentao Zhang ⋅ Conghui He
Abstract
This paper introduces UniMERNet, a high-accuracy, computation-efficient algorithm for Mathematical Expression Recognition (MER) across diverse real-world scenarios. To facilitate UniMERNet's training, we constructed UniMER-1M, a million-scale dataset whose unprecedented diversity endows the model with robust generalization ability. Through in-depth analysis, we discover a distinctive raster-scan pattern (left-to-right, top-to-bottom) in the attention distribution of Transformer models for MER tasks, which closely aligns with human reading habits. Based on this key finding, we design an innovative **Raster-Scan Attention** mechanism that employs a ``horizontal-first, vertical-second" sequential attention computation strategy. This approach not only successfully reduces computational complexity from $ \mathcal{O}(NH^2W^2D) $ to $ \mathcal{O}(NHWD(H + W)) $, but also enables the model to capture long-range dependencies more efficiently, achieving recognition performance comparable to global attention. Leveraging both UniMER-1M and our innovative attention mechanism, UniMERNet achieves state-of-the-art performance across four real-world scenarios while significantly reducing computational resources compared to global attention: over **1.2$\times$** memory savings during training, approximately **10$\times$** memory reduction during inference, and **5$\times$** speed with slightly improved accuracy. All resources will be publicly released to advance MER research further.
Successful Page Load