DocSLM: A Small Vision-Language Model for Long Multimodal Document Understanding
Tanveer Hannan, Dimitrios Mallios, Parth Pathak, Faegheh Sardari, Thomas Seidl, Gedas Bertasius, Mohsen Fayyaz, Sunando Sengupta
Keywords:
Vision, Language, and Reasoning
Successful Page Load