A Multi-Scale Feature Fusion Network With Cascaded Supervision for Cross-Scene Crowd Counting

被引:0
|
作者
Zhang, Xinfeng [1 ]
Han, Lina [1 ]
Shan, Wencong [1 ]
Wang, Xiaohu [1 ]
Chen, Shuhan [1 ]
Zhu, Congcong [1 ]
Li, Bin [1 ]
机构
[1] Yangzhou Univ, Coll Informat Engn, Coll Artificial Intelligence, Jiangsu Prov Engn Res Ctr Knowledge Management & I, Yangzhou 225127, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Training; Image resolution; Location awareness; Annotations; Testing; Training data; Background suppression (BS) loss; cascaded supervision; crowd counting; dilated convolution; multi-scale feature fusion; SCALE;
D O I
10.1109/TIM.2023.3246534
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Counting the number of people in public places has received much attention, and researchers have devoted much effort to the task. However, the existing crowd counting approaches are mainly trained and tested in similar scenarios. The performance of crowd counting approaches degrades sharply when the test scenarios of the models are of different types from its training scenes. In practice, the crowd scenes are highly variable, and the lack of cross-scene capability could seriously limit the application of the existing approaches. We attribute the improvement in cross-scene crowd counting capability to the necessity of accommodating large changes in the scale of individuals and the ability to suppress the interference of cluttered backgrounds. To this end, we propose a multi-scale feature fusion network (MFFNet) with cascaded supervision. The multi-scale features extracted from the crowd images are upsampled and then combined into several feature blocks, followed by convolution and deconvolution operations on the feature blocks to derive feature matrices of different resolutions. The feature matrices are fused from bottom to top. In the process of feature fusion, the crowd density maps corresponding to the feature matrices of different resolutions are predicted separately. We devise cascaded supervision to synchronously optimize the network of different resolution density map prediction during training. The cross-scene crowd counting experiments are conducted on four types of scenes: ShanghaiTech Part_A (SHT A) with high-density crowd scenes and small-scale individuals, ShanghaiTech Part_B (SHT B) with sparse crowd distribution and medium-scale individuals, UCF_CC_50 dataset with extremely dense scenes and tiny scale individuals, and UCF-QNRF dataset with extreme variations. MFFNet exhibits the strongest scene adaptability relative to the state-of-the-art approaches, with an average decrease of 17.1% and 8.4% in mean absolute error (MAE) and mean square error (mse), respectively. The contributions of different components in our method are verified in the ablation study using the devised evaluation metrics. Our implementation will be available at https://github.com/learnsharing/MFFNet.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] STOCHASTIC MULTI-SCALE AGGREGATION NETWORK FOR CROWD COUNTING
    Wang, Mingjie
    Cai, Hao
    Zhou, Jun
    Gong, Minglun
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2008 - 2012
  • [32] Crowd Counting based on Multi-level Multi-scale Feature
    Wu, Di
    Fan, Zheyi
    Yi, Shuhan
    APPLIED INTELLIGENCE, 2023, 53 (19) : 21891 - 21901
  • [33] Cross-scene Crowd Counting via Deep Convolutional Neural Networks
    Zhang, Cong
    Li, Hongsheng
    Wang, Xiaogang
    Yang, Xiaokang
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 833 - 841
  • [34] Multi-level feature fusion network for crowd counting
    Wang, Luyang
    Li, Yun
    Peng, Sifan
    Tang, Xiao
    Yin, Baoqun
    IET COMPUTER VISION, 2021, 15 (01) : 60 - 72
  • [35] Investigating Synthetic Data Sets for Crowd Counting in Cross-scene Scenarios
    Delussu, Rita
    Putzu, Lorenzo
    Fumera, Giorgio
    VISAPP: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 4: VISAPP, 2020, : 365 - 372
  • [36] MULTI-STEP QUANTIZATION OF A MULTI-SCALE NETWORK FOR CROWD COUNTING
    Shim, Kyujin
    Byun, Junyoung
    Kim, Changick
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 683 - 687
  • [37] MSR-FAN: Multi-scale residual feature-aware network for crowd counting
    Zhao, Haoyu
    Min, Weidong
    Wei, Xin
    Wang, Qi
    Fu, Qiyan
    Wei, Zitai
    IET IMAGE PROCESSING, 2021, 15 (14) : 3512 - 3521
  • [38] Crowd Counting Method Based on Multi-Scale Enhanced Network
    Xu Tao
    Duan Yinong
    Du Jiahao
    Liu Caihua
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (06) : 1764 - 1771
  • [39] Dense Crowd Counting Network Based on Multi-scale Perception
    Li, Hengchao
    Liu, Xianglian
    Liu, Peng
    Feng, Bin
    Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 2024, 59 (05): : 1176 - 1183
  • [40] MSIANet: Multi-scale Interactive Attention Crowd Counting Network
    Zhang, Shihui
    Zhao, Weibo
    Wang, Lei
    Wang, Wei
    Li, Qunpeng
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (06) : 2236 - 2245