Skip to yearly menu bar Skip to main content


Referring Expression Counting

Siyang Dai · Jun Liu · Ngai-Man Cheung

Arch 4A-E Poster #231
award Highlight
[ ]
Thu 20 Jun 5 p.m. PDT — 6:30 p.m. PDT


Existing counting tasks are limited to the class level, which don’t account for fine-grained details within the class. In real applications, it often requires in-context or referring human input for counting target objects. Take urban analysis as an example, fine-grained information such as traffic flow in different directions, pedestrians and vehicles waiting or moving at different sides of the junction, is more beneficial. Current settings of both class-specific and class-agnostic counting treat objects of the same class indifferently, which pose limitations in real use cases. To this end, we propose a new task named Referring Expression Counting (REC) which aims to count objects with different attributes within the same class. To evaluate the REC task, we create a novel dataset named REC-8K which contains 8011 images and 17122 referring expressions. Experiments on REC-8K show that our proposed method achieves state-of-the-art performance compared with several text-based counting methods and an open-set object detection model. We also outperform prior models on the class agnostic counting (CAC) benchmark [36] for the zero-shot setting, and perform on par with the few-shot methods. Code and dataset is available at

Live content is unavailable. Log in and register to view live content