Skip to yearly menu bar Skip to main content


PairDETR : Joint Detection and Association of Human Bodies and Faces

Ammar Ali · Georgii Gaikov · Denis Rybalchenko · Alexander Chigorin · Ivan Laptev · Sergey Zagoruyko

Arch 4A-E Poster #25
[ ]
Wed 19 Jun 10:30 a.m. PDT — noon PDT


Image and video analysis requires not only accurate object detection but also the understanding of relationships among detected objects.Common solutions to relation modeling typically resort to stand-alone object detectors followed by non-differentiable post-processing techniques. Recently introduced detection transformers (DETR) perform end-to-end object detection based on a bipartite matching loss. Using traditional object detection methods or even DETR-based models (deformable detr, dino, etc.) lacks the ability to detect objects and their relationships directly.In this paper, we build on this approach and extend it to the joint detection of objects and their relationships.A naive extension of DETR to object relations, however, leads to a NP-hard problem. To this end, we propose an approximate solution based on bipartite matching.While our method can generalize to an arbitrary number of objects, we here focus on the modeling of object pairs and their relations.In particular, we apply our method PairDETR to the problem of detecting human bodies, faces, and associations between bodies and faces of the same person. Our approach not only eliminates the need for hand-designed post-processing but also achieves excellent results for body-face associations.We evaluate PairDETR on the challenging CrowdHuman and CityPersons datasets and demonstrate a significant improvement over the state of the art. Our training code and pre-trained models will become publicly available.

Live content is unavailable. Log in and register to view live content