Human-like Abstract Visual Reasoning via Understanding and Solving Reasoning Loop
Abstract
Abstract visual reasoning benchmarks such as ARC-AGI evaluate the ability to infer generalizable transformation rules from few graphical demonstrations—a capability where current deep learning models severely underperform. Mainstream large language models achieve only 15.8% (DeepSeek-R1) and 34.5% (o3-mini-high) test accuracy. The core reason lies in their static processing of task examples: unlike humans, who iteratively refine their understanding of examples while constructing solutions, these models lack mechanisms for dynamically aligning understanding and solving. We address this gap with the Understanding and Solving Reasoning Loop (USRL) framework. The architecture comprises two explicitly interacting modules: an Understanding Module (UM) that encodes and refines rule representations of examples, and a Solving Module (SM) that generates a draft solution informed by these evolving contexts. Through recurrent interaction, the model iteratively aligns its draft solution with its understanding about task examples continuously. Furthermore, we introduce an adaptive reasoning halting mechanism that autonomously terminates the reasoning loop based on the consistency between the generated draft solution and the rule representations. With only 7M parameters, our model achieves 47.2% accuracy on ARC-AGI-1, significantly outperforming both DeepSeek-R1 and o3-mini-high. This reveals that neurocognitive principles offer an effective pathway for abstract reasoning, with implications extending to compositional generalization and structured problem-solving.