Skip to yearly menu bar Skip to main content


Poster

Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning

Dongyao Jiang · Haodong Jing · Yongqiang Ma · Nanning Zheng


Abstract:

Human reasoning naturally combines concepts to identify unseen compositions, a capability that Compositional Zero-Shot Learning (CZSL) aims to replicate in machine learning models. However, we observe that focusing solely on typical image classification tasks in CZSL may limit models' compositional generalization potential. To address this, we introduce C-EgoExo, a video-based benchmark, along with a compositional action recognition task to enable more comprehensive evaluations. Inspired by human reasoning processes, we propose a Dual-branch Hybrid Discrimination (DHD) framework, featuring two branches that decode visual inputs in distinct observation sequences. Through a cross-attention mechanism and a contextual dependency encoder, DHD effectively mitigates challenges posed by conditional variance. We further design a Copula-based orthogonal decoding loss to counteract contextual interference in primitive decoding. Our approach demonstrates outstanding performance across diverse CZSL tasks, excelling in both image-based and video-based modalities and in attribute-object and action-object compositions, setting a new benchmark for CZSL evaluation.

Live content is unavailable. Log in and register to view live content