Skip to yearly menu bar Skip to main content


Poster

Perceptual Inductive Bias Is What You Need Before Contrastive Learning

Junru Zhao · Tianqin Li · Dunhan Jiang · Shenghao Wu · Alan Ramirez · Tai Sing Lee


Abstract:

David Marr’s seminal theory of human perception proposes that visual representation learning should follow a multi-stage process, prioritizing the derivation of boundary and surface properties before forming semantic object representations. In contrast, contrastive representation learning frameworks typically bypass this explicit multi-stage approach, defining their objective as the direct learning of a semantic representation space for objects. While effective in general contexts, this approach sacrifices the inductive biases of vision, leading to slower convergence speed and shortcuts learning such as texture bias. In this work, we demonstrate that leveraging Marr’s multi-stage theory—by first constructing boundary and surface-level representations using perceptual constructs from early visual processing stages and subsequently training for object semantics—leads to 2x faster convergence on ResNet18, improved final representations on semantic segmentation, depth estimation, and object recognition, and enhanced robustness and out-of-distribution capability. Together, we propose a pretraining stage before the general contrastive representation pretraining to further enhance the final representation quality and reduce the overall convergence time via inductive bias from human vision systems.

Live content is unavailable. Log in and register to view live content