Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

被引:36
|
作者
Thanasutives, Pongpisit [1 ]
Fukui, Ken-ichi [2 ]
Numao, Masayuki [2 ]
Kijsirikul, Boonserm [3 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Suita, Osaka, Japan
[2] Osaka Univ, Inst Sci & Ind Res, Suita, Osaka, Japan
[3] Chulalongkorn Univ, Fac Engn, Dept Comp Engn, Bangkok, Thailand
关键词
D O I
10.1109/ICPR48806.2021.9413286
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose two modified neural networks based on dual path multi-scale fusion networks (SFANet) and SegNet for accurate and efficient crowd counting. Inspired by SFANet, the first model, which is named M-SFANet, is attached with atrous spatial pyramid pooling (ASPP) and context-aware module (CAN). The encoder of M-SFANet is enhanced with ASPP containing parallel atrous convolutional layers with different sampling rates and hence able to extract multi-scale features of the target object and incorporate larger context. To further deal with scale variation throughout an input image, we leverage the CAN module which adaptively encodes the scales of the contextual information. The combination yields an effective model for counting in both dense and sparse crowd scenes. Based on the SFANet decoder structure, M-SFANet's decoder has dual paths, for density map and attention map generation. The second model is called M-SegNet, which is produced by replacing the bilinear upsampling in SFANet with max unpooling that is used in SegNet. This change provides a faster model while providing competitive counting performance. Designed for high-speed surveillance applications, M-SegNet has no additional multi-scale-aware module in order to not increase the complexity. Both models are encoder-decoder based architectures and are end-to-end trainable. We conduct extensive experiments on five crowd counting datasets and one vehicle counting dataset to show that these modifications yield algorithms that could improve state-of-the-art crowd counting methods. Codes are available at https://github.com/Pongpisit-Thanasuaves/Variations-of-SFANet-for-Crowd-Counting.
引用
收藏
页码:2382 / 2389
页数:8
相关论文
共 50 条
  • [1] Attentive encoder-decoder networks for crowd counting
    Liu, Xuhui
    Hu, Yutao
    Zhang, Baochang
    Zhen, Xiantong
    Luo, Xiaoyan
    Cao, Xianbin
    [J]. NEUROCOMPUTING, 2022, 490 : 246 - 257
  • [2] Counting in congested crowd scenes with hierarchical scale-aware encoder-decoder network
    Han, Run
    Qi, Ran
    Lu, Xuequan
    Huang, Lei
    Lyu, Lei
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [3] Multi-scale Supervised Attentive Encoder-Decoder Network for Crowd Counting
    Zhang, Anran
    Jiang, Xiaolong
    Zhang, Baochang
    Cao, Xianbin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
  • [4] Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks
    Jiang, Xiaolong
    Xiao, Zehao
    Zhang, Baochang
    Zhen, Xiantong
    Cao, Xianbin
    Doermann, David
    Shao, Ling
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6126 - 6135
  • [5] Encoder-decoder based convolutional neural networks for image forgery detection
    El Biach, Fatima Zahra
    Iala, Imad
    Laanaya, Hicham
    Minaoui, Khalid
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (16) : 22611 - 22628
  • [6] Classification of Arrhythmia Based on Convolutional Neural Networks and Encoder-Decoder Model
    Liu, Jian
    Xia, Xiaodong
    Han, Chunyang
    Hui, Jiao
    Feng, Jim
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (01): : 265 - 278
  • [7] Encoder-decoder based convolutional neural networks for image forgery detection
    Fatima Zahra El Biach
    Imad Iala
    Hicham Laanaya
    Khalid Minaoui
    [J]. Multimedia Tools and Applications, 2022, 81 : 22611 - 22628
  • [8] MULTI-SCALE CONVOLUTIONAL NEURAL NETWORKS FOR CROWD COUNTING
    Zeng, Lingke
    Xu, Xiangmin
    Cai, Bolun
    Qiu, Suo
    Zhang, Tong
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 465 - 469
  • [9] MEDA: Multi-output Encoder-Decoder for Spatial Attention in Convolutional Neural Networks
    Li, Huayu
    Razi, Abolfazl
    [J]. CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 2087 - 2091
  • [10] Internal Covariate Shift Reduction in Encoder-Decoder Convolutional Neural Networks
    Darwish, Ali
    Nakhmani, Arie
    [J]. PROCEEDINGS OF THE SOUTHEAST CONFERENCE ACM SE'17, 2017, : 179 - 182