NUK - logo
E-viri
Celotno besedilo
Recenzirano
  • EDENet: Elaborate density e...
    Xia, Yinfeng; He, Yuqiang; Peng, Sifan; Hao, Xiaoliang; Yang, Qianqian; Yin, Baoqun

    Neurocomputing (Amsterdam), 10/2021, Letnik: 459
    Journal Article

    For the CNN-based density estimation approaches in the field of crowd counting, how to generate a high-quality density map with accurate counting performance and detailed spatial description is still an open question. In this paper, to tackle the aforementioned contradiction, we propose an end-to-end trainable architecture called Elaborate Density Estimation Network for Crowd Counting (EDENet), which can gradually generate high-quality density estimation maps based on distributed supervision. Specifically, EDENet is composed of Feature Extraction Network (FEN), Feature Fusion Network (FFN), Double-Head Network (DHN) and Adaptive Density Fusion Network (ADFN). The FEN adopts VGG as the backbone network and employs Spatial Adaptive Pooling (SAP) to extract coarse-grained features. The FFN can effectively fuse contextual information and localization information for enhancing the spatial description ability of fine-grained features. In the DHN, the Density Attention Module (DAM) can provide attention masks of foreground-background, thereby urging the Density Regression Module (DRM) to focus on the pixels around the head annotations to regress density maps with different resolutions. The ADFN constructed on the basis of the adaptive weighting mechanism can directly introduce coarse-grained density representation into high-resolution density maps to strengthen the commonality and dependency among density maps. Extensive experiments on four benchmark crowd datasets (the ShanghaiTech, the UCF-QNRF, the JHU-CRWORD++ and the NWPU-Crowd) indicate that EDENet can achieve state-of-the-art recognition performance and high robustness. Not only that, the density map with the highest Peak Signal to Noise Ratio (PSNR) can be considered to be of high quality.