A Multi-Scale Feature Fusion Network With Cascaded Supervision for Cross-Scene Crowd Counting

被引：0

作者：

Zhang, Xinfeng ^{[1
]}

Han, Lina ^{[1
]}

Shan, Wencong ^{[1
]}

Wang, Xiaohu ^{[1
]}

Chen, Shuhan ^{[1
]}

Zhu, Congcong ^{[1
]}

Li, Bin ^{[1
]}

机构：

[1] Yangzhou Univ, Coll Informat Engn, Coll Artificial Intelligence, Jiangsu Prov Engn Res Ctr Knowledge Management & I, Yangzhou 225127, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2023年 / 72卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Training; Image resolution; Location awareness; Annotations; Testing; Training data; Background suppression (BS) loss; cascaded supervision; crowd counting; dilated convolution; multi-scale feature fusion; SCALE;

D O I：

10.1109/TIM.2023.3246534

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Counting the number of people in public places has received much attention, and researchers have devoted much effort to the task. However, the existing crowd counting approaches are mainly trained and tested in similar scenarios. The performance of crowd counting approaches degrades sharply when the test scenarios of the models are of different types from its training scenes. In practice, the crowd scenes are highly variable, and the lack of cross-scene capability could seriously limit the application of the existing approaches. We attribute the improvement in cross-scene crowd counting capability to the necessity of accommodating large changes in the scale of individuals and the ability to suppress the interference of cluttered backgrounds. To this end, we propose a multi-scale feature fusion network (MFFNet) with cascaded supervision. The multi-scale features extracted from the crowd images are upsampled and then combined into several feature blocks, followed by convolution and deconvolution operations on the feature blocks to derive feature matrices of different resolutions. The feature matrices are fused from bottom to top. In the process of feature fusion, the crowd density maps corresponding to the feature matrices of different resolutions are predicted separately. We devise cascaded supervision to synchronously optimize the network of different resolution density map prediction during training. The cross-scene crowd counting experiments are conducted on four types of scenes: ShanghaiTech Part_A (SHT A) with high-density crowd scenes and small-scale individuals, ShanghaiTech Part_B (SHT B) with sparse crowd distribution and medium-scale individuals, UCF_CC_50 dataset with extremely dense scenes and tiny scale individuals, and UCF-QNRF dataset with extreme variations. MFFNet exhibits the strongest scene adaptability relative to the state-of-the-art approaches, with an average decrease of 17.1% and 8.4% in mean absolute error (MAE) and mean square error (mse), respectively. The contributions of different components in our method are verified in the ablation study using the devised evaluation metrics. Our implementation will be available at https://github.com/learnsharing/MFFNet.

引用

页数：15

共 50 条

[41] MHANet: Multi-scale hybrid attention network for crowd counting
Yu, Ying
Yu, Jiamao
Qian, Jin
Zhu, Zhiliang
Han, Xing
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 9445 - 9455
[42] COMAL: compositional multi-scale feature enhanced learning for crowd counting
Zhou, Fangbo
Zhao, Huailin
Zhang, Yani
Zhang, Qing
Liang, Lanjun
Li, Yaoyao
Duan, Zuodong
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (15) : 20541 - 20560
[43] Multi-Scale Network with Integrated Attention Unit for Crowd Counting
Hafeezallah, Adel
Al-Dhamari, Ahlam
Abu-Bakar, Syed Abd Rahman
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (02): : 3879 - 3903
[44] COMAL: compositional multi-scale feature enhanced learning for crowd counting
Fangbo Zhou
Huailin Zhao
Yani Zhang
Qing Zhang
Lanjun Liang
Yaoyao Li
Zuodong Duan
Multimedia Tools and Applications, 2022, 81 : 20541 - 20560
[45] JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting
Mingtao Wang
Xin Zhou
Yuanyuan Chen
Knowledge and Information Systems, 2024, 66 : 3033 - 3053
[46] JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting
Wang, Mingtao
Zhou, Xin
Chen, Yuanyuan
KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (05) : 3033 - 3053
[47] Multi-scale dilated convolution of convolutional neural network for crowd counting
Yanjie Wang
Shiyu Hu
Guodong Wang
Chenglizhao Chen
Zhenkuan Pan
Multimedia Tools and Applications, 2020, 79 : 1057 - 1073
[48] Cascade-guided multi-scale attention network for crowd counting
Shufang Li
Zhengping Hu
Mengyao Zhao
Zhe Sun
Signal, Image and Video Processing, 2021, 15 : 1663 - 1670
[49] MGSNet: A multi-scale and gated spatial attention network for crowd counting
Ying Shi
Jun Sang
Zhongyuan Wu
Fusen Wang
Xinyue Liu
Xiaofeng Xia
Nong Sang
Applied Intelligence, 2022, 52 : 15436 - 15446
[50] Crowd Counting via Residual Multi-scale Convolutional Neural Network
Lu, Jingang
Zhang, Li
2019 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2019, : 315 - 320

← 1 2 3 4 5 →