Multi-scale Supervised Attentive Encoder-Decoder Network for Crowd Counting

被引:8
|
作者
Zhang, Anran [1 ]
Jiang, Xiaolong [1 ]
Zhang, Baochang [2 ]
Cao, Xianbin [1 ,3 ,4 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, XueYuan Rd 37, Beijing, Peoples R China
[2] Beihang Univ, Sch Automat Sci & Elect Engn, XueYuan Rd 37, Beijing, Peoples R China
[3] Beihang Univ, Key Lab Adv Technol Near Space Informat Syst, Minist Ind & Informat Technol China, XueYuan Rd 37, Beijing, Peoples R China
[4] Beijing Adv Innovat Ctr Big Data Based Precis Med, XueYuan Rd 37, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-attention; representation fusion; supervised method; HUMANS; SEGMENTATION; TRACKING; MULTIPLE; IMAGE;
D O I
10.1145/3356019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Crowd counting is a popular topic with widespread applications. Currently, the biggest challenge to crowd counting is large-scale variation in objects. In this article, we focus on overcoming this challenge by proposing a novel Attentive Encoder-Decoder Network (AEDN), which is supervised on multiple feature scales to conduct crowd counting via density estimation. This work has three main contributions. First, we augment the traditional encoder-decoder architecture with our proposed residual attention blocks, which, beyond skip-connected encoded features, further extend the decoded features with attentive features. AEDN is better at establishing long-range dependencies between the encoder and decoder, therefore promoting more effective fusion of multi-scale features for handling scale-variations. Second, we design a new KL-divergence-based distribution loss to supervise the scale-aware structural differences between two density maps, which complements the pixel-isolated MSE loss and better optimizes AEDN to generate high-quality density maps. Third, we adopt a multi-scale supervision scheme, such that multiple KL divergences and MSE losses are deployed at all decoding stages, providing more thorough supervisions for different feature scales. Extensive experimental results on four public datasets, including ShanghaiTech Part A, ShanghaiTech Part B, UCF-CC-50, and UCF-QNRF, reveal the superiority and efficacy of the proposed method, which outperforms most state-of-the-art competitors.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] MHANet: Multi-scale hybrid attention network for crowd counting
    Yu, Ying
    Yu, Jiamao
    Qian, Jin
    Zhu, Zhiliang
    Han, Xing
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 9445 - 9455
  • [42] Double multi-scale feature fusion network for crowd counting
    Liu, Qian
    Fang, Jiongtao
    Zhong, Yixiong
    Wang, Cunbao
    Qi, Youwei
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024,
  • [43] Multi-Scale Network with Integrated Attention Unit for Crowd Counting
    Hafeezallah, Adel
    Al-Dhamari, Ahlam
    Abu-Bakar, Syed Abd Rahman
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (02): : 3879 - 3903
  • [44] MULTI-STEP CHORD SEQUENCE PREDICTION BASED ON AGGREGATED MULTI-SCALE ENCODER-DECODER NETWORKS
    Carsault, Tristan
    McLeod, Andrew
    Esling, Philippe
    Nika, Jerome
    Nakamura, Eita
    Yoshii, Kazuyoshi
    [J]. 2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [45] Highly efficient encoder-decoder network based on multi-scale edge enhancement and dilated convolution for LDCT image denoising
    Jia, Lina
    He, Xu
    Huang, Aimin
    Jia, Beibei
    Wang, Xinfeng
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 6081 - 6091
  • [46] An End to End Encoder-Decoder Network with Multi-scale Feature Pulling for Detecting Local Changes From Video Scene
    Panda, Manoj Kumar
    Subudhi, Badri Narayan
    Bouwmans, Thierry
    Jakheytiya, Vinit
    Veerakumar, T.
    [J]. 2022 18TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS 2022), 2022,
  • [47] MEDUSA: Multi-Scale Encoder-Decoder Self-Attention Deep Neural Network Architecture for Medical Image Analysis
    Aboutalebi, Hossein
    Pavlova, Maya
    Gunraj, Hayden
    Shafiee, Mohammad Javad
    Sabri, Ali
    Alaref, Amer
    Wong, Alexander
    [J]. FRONTIERS IN MEDICINE, 2022, 8
  • [48] Uni MS-PS: A multi-scale encoder-decoder transformer for universal photometric stereo
    Hardy, Clement
    Queau, Yvain
    Tschumperle, David
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 248
  • [49] A multi-scale and multi-level feature aggregation network for crowd counting
    Zhu, Fushun
    Yan, Hua
    Chen, Xinyue
    Li, Tong
    Zhang, Zhengyu
    [J]. NEUROCOMPUTING, 2021, 423 : 46 - 56
  • [50] Multi-scale dilated convolution of convolutional neural network for crowd counting
    Yanjie Wang
    Shiyu Hu
    Guodong Wang
    Chenglizhao Chen
    Zhenkuan Pan
    [J]. Multimedia Tools and Applications, 2020, 79 : 1057 - 1073