Attentive encoder-decoder networks for crowd counting

被引:0
|
作者
Liu, Xuhui [1 ]
Hu, Yutao [1 ]
Zhang, Baochang [2 ]
Zhen, Xiantong [3 ]
Luo, Xiaoyan [4 ]
Cao, Xianbin [1 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China
[2] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing, Peoples R China
[3] Incept Inst Artificial Intelligence, Beijing, Peoples R China
[4] Beihang Univ, Sch Astronaut, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Crowd counting; Scale-variation; Attention mechanism; Separable non-local;
D O I
10.1016/j.neucom.2021.11.087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crowd counting that aims to estimate the crowd density has recently made significant progress but remains an unsolved problem due to several challenges. In this paper, we propose an Attentive Encoder-Decoder Network (AEDNet) to overcome the notorious scale-variation problem in crowd counting. Our major contributions can be summarized in three aspects. First, we design an Attentive Feature Refinement (AFR) block in the encoder to adaptively extract multi-scale features. AFR compares the spatial information in different scales through the attention mechanism and then adaptively assign importance weights to each point, which highlights the distinctive roles in multi-scale feature extraction. Second, we develop a Separable Non-local Fusion (SNF) block in the decoder with the self-attention mechanism to aggregate multi-scale features from different layers, which not only achieves the sufficient feature fusion by capturing long-range dependencies, but also vastly reduces the computation cost compared to the original non-local operation. Third, we propose a Regional MSE (R-MSE) loss to tackle the pixel-isolation problems in regular MSE loss. To demonstrate the effectiveness of the proposed AEDNet, we conduct extensive experiments on four widely-used crowd counting datasets, and our AEDNet consistently achieves the state-of-the-art performance. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:246 / 257
页数:12
相关论文
共 50 条
  • [1] Multi-scale Supervised Attentive Encoder-Decoder Network for Crowd Counting
    Zhang, Anran
    Jiang, Xiaolong
    Zhang, Baochang
    Cao, Xianbin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
  • [2] Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks
    Jiang, Xiaolong
    Xiao, Zehao
    Zhang, Baochang
    Zhen, Xiantong
    Cao, Xianbin
    Doermann, David
    Shao, Ling
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6126 - 6135
  • [3] SENetCount: An Optimized Encoder-Decoder Architecture with Squeeze-and-Excitation for Crowd Counting
    Meng, Xiaolong
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [4] Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting
    Thanasutives, Pongpisit
    Fukui, Ken-ichi
    Numao, Masayuki
    Kijsirikul, Boonserm
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2382 - 2389
  • [5] MobileCount: An efficient encoder-decoder framework for real-time crowd counting
    Wang, Peng
    Gao, Chenyu
    Wang, Yang
    Li, Hui
    Gao, Ye
    [J]. NEUROCOMPUTING, 2020, 407 : 292 - 299
  • [6] Human Conversation Analysis Using Attentive Multimodal Networks with Hierarchical Encoder-Decoder
    Gu, Yue
    Li, Xinyu
    Huang, Kaixiang
    Fu, Shiyu
    Yang, Kangning
    Chen, Shuhong
    Zhou, Moliang
    Marsic, Ivan
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 537 - 545
  • [7] Counting in congested crowd scenes with hierarchical scale-aware encoder-decoder network
    Han, Run
    Qi, Ran
    Lu, Xuequan
    Huang, Lei
    Lyu, Lei
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [8] An Effective Lightweight Crowd Counting Method Based on an Encoder-Decoder Network for Internet of Video Things
    Yi, Jun
    Chen, Fan
    Shen, Zhilong
    Xiang, Yi
    Xiao, Shan
    Zhou, Wei
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (02): : 3082 - 3094
  • [9] Interpretable Transformations with Encoder-Decoder Networks
    Worrall, Daniel E.
    Garbin, Stephan J.
    Turmukhambetov, Daniyar
    Brostow, Gabriel J.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5737 - 5746
  • [10] Attentive U-recurrent encoder-decoder network for image dehazing
    Yin, Shibai
    Wang, Yibin
    Yang, Yee-Hong
    [J]. NEUROCOMPUTING, 2021, 437 : 143 - 156