BPJDet: Extended Object Representation for Generic Body-Part Joint Detection Arxiv 2023 (accepted by TPAMI in 2024.01) The conference verison named "Body-Part Joint Detection and Association via Extended Object Representation" has obtained the "ICME 2023 Best Student Paper Runner Up Award"
- Huayi Zhou Shanghai Jiao Tong University (SJTU)
- Fei Jiang East China Normal University (ECNU)
Jiaxin Si Chongqing Qulian Digital Technology Company- Yue Ding Shanghai Jiao Tong University (SJTU)
- Hongtao Lu Shanghai Jiao Tong University (SJTU)
Abstract
Detection of human body and its parts (e.g., head or hands) has been intensively studied. However, most of these CNNs-based detectors are trained independently, making it difficult to associate detected parts with body. In this paper, we focus on the joint detection of human body and its corresponding parts. Specifically, we propose a novel extended object representation integrating center-offsets of body parts, and construct a dense one-stage generic Body-Part Joint Detector (BPJDet). In this way, body-part associations are neatly embedded in a unified object representation containing both semantic and geometric contents. Therefore, we can perform multi-loss optimizations to tackle multi-tasks synergistically. BPJDet does not suffer from error-prone post matching, and keeps a better trade-off between speed and accuracy. Furthermore, BPJDet can be generalized to detect any one or more body parts. To verify the superiority of BPJDet, we conduct experiments on three body-part datasets (CityPersons, CrowdHuman and BodyHands) and one body-parts dataset COCOHumanParts. While keeping high detection accuracy, BPJDet achieves state-of-the-art association performance on all datasets comparing with its counterparts. Besides, we show benefits of advanced body-part association capability by improving performance of two representative downstream applications: accurate crowd head detection and hand contact estimation.
BPJDet vs. BPJDetPlus
BPJDet (body-part) | BPJDetPlus (body-parts) |
BPJDetPlus is newly added in the journal version of BPJDet. It has various new functions: (1) Multiple Body-Parts Joint Detectionm; (2) Two downstream applications including Body-Head for Accurate Crowd Counting and Body-Hand for Hand Contact Estimation.
More Quantitative Results of BPJDet and BPJDetPlus
We show more result visualization examples of trained BPJDet-L on four different joint detection tasks: body-face, body-hand, body-head and body-parts respectively. The body-face and body-head tasks are trained on the train-set of CrowdHuman. They perform satisfactory detection and association capabilities on images collected in-the-wild, showing excellent adaptability. The body-hand task is trained on the train-set of BodyHands. Although the model does not achieve body detection results as good as trained on CrowdHuman due to the sparse and incomplete labels of the BodyHands dataset, it can still accurately match the detected hands to their body boxes thanks to the superior association ability of our BPJDet. The body-parts task is trained on the COCOHumanParts. We can observe that subtasks including body-face, body-hand and body-head are all included and unified in one model which exhibits impressive results. This profits from the annotation richness and data diversity of benchmark, while is also inseparable from the state-of-the-art power of BPJDet.
Application 1: Body-Head for Accurate Crowd Counting
Through the joint detection of human body and head, false detections of single-object detectors are eliminated, and accurate detection and counting of crowded persons are achieved without retraining. The detection effect is robust and can automatically adapt to unfamiliar scenes and various closed environments, including classrooms, conference rooms, shopping malls, etc. The used images are all from the dataset SCUT-HEAD PartB
Application 2: Body-Hand for Hand Contact Estimation
The purpose of this task is to detect human hands in a natural state and estimate their physical contact status, which is divided into the following four categories: ① No-Contact: the hand is not in contact with any object; ② Self-Contact: The hand comes into contact with certain parts of the human body itself; ③ Contact with others (Person-Contact): the hand comes into contact with other human body parts that are not the person's own; ④ Contact with the object (Object-Contact): the hand comes into contact with non-human body objects. And the state of the hand may be a superposition of the above categories. For example, someone's hand can be holding an object while touching another person. Human hand contact state estimation can be used to support a variety of applications, such as harassment detection, pollution prevention, behavior recognition or AR/VR, etc. More details can be found in ContactHands(NIPS2020) Detecting Hands and Recognizing Physical Contact in the Wild
Citation
Acknowledgements
We acknowledge the effort from authors of human-related datasets including CityPersons, CrowdHuman, BodyHands, COCOHumanParts, SCUT-HEAD PartB, CroHD and ContactHands. These datasets make researches and downstream applications about generic body-part joint detection and association possible. This paper was supported by NSFC (No. 62176155, 62207014), Shanghai Municipal Science and Technology Major Project, China, under grant no. 2021SHZDZX0102.
The website template was borrowed from Jon Barron and Zip-NeRF.