Skip to yearly menu bar Skip to main content


Poster

Open Ad-hoc Categorization with Contextualized Feature Learning

Zilin Wang · Sangwoo Mo · Stella X. Yu · Sima Behpour · Liu Ren


Abstract:

Unlike common categories for plants and animals, ad-hoc categories such as things to sell at a garage sale are created to help people achieve a certain task. Likewise, AI agents need to adaptively categorize visual scenes in response to changing tasks. We thus study open ad-hoc categorization, where we learn to infer novel concepts and name images according to a varying categorization purpose, a few labeled exemplars, and many unlabeled images.We develop a simple method that combines top-down text guidance (CLIP) with bottom-up image clustering (GCD) to learn contextualized visual features and align visual clusters with CLIP semantics, enabling predictions for both known and novel classes. Benchmarked on multi-label datasets Stanford and Clevr-4, our so-called OAK significantly outperforms baselines in providing accurate predictions across contexts and identifying novel concepts, e.g., it achieves 87.4% novel accuracy on Stanford Mood, surpassing CLIP and GCD by over 50%. OAK offers interpretable saliency maps, focusing on hands, faces, and backgrounds for Action, Mood, and Location contexts, respectively.

Live content is unavailable. Log in and register to view live content