Dual-branch crowd counting algorithm based on self-attention mechanism

被引:0
|
作者
Yang T.-L. [1 ]
Li L.-X. [1 ]
Zhang W. [1 ]
机构
[1] School of Microelectronics, Tianjin University, Tianjin
关键词
crowd counting; deep learning; dual-branch; self-attention mechanism; weakly supervised learning;
D O I
10.3785/j.issn.1008-973X.2023.10.005
中图分类号
学科分类号
摘要
A dual-branch crowd counting algorithm based on self-attention mechanism was proposed to solve the problems of large variation in head scale and complex background interference in crowd counting. The algorithm combined two network frameworks, including convolutional neural network (CNN) and Transformer. The multi-scale CNN branch and Transformer branch based on convolution enhanced self-attention module were used to obtain local and global crowd information respectively. The dual-branch attention fusion module was designed to enable continuous-scale crowd feature extraction. The Transformer network with the hybrid attention module was utilized to extract deep features, which facilitated the distinction of complex backgrounds and focused on the crowd regions. The experiments were conducted on ShanghaiTech Part A, ShanghaiTech Part B, UCF-QNRF, JHU-Crowd++ and other datasets using position-level full supervision and count-level weak supervision. Results showed that the performance of the proposed algorithms was better than that of recent studies. The MAE and MSE of the fully supervised algorithm in the above datasets were 55.3, 6.7, 82.9, 55.7, and 93.1, 9.8, 145.1, 248.0, respectively, which could achieve accurate counting in high density and high occlusion scenes. Good counting precision was achieved with low parameters, and a counting accuracy of 87.9% of the full supervision was attained especially in the comparison of weakly supervised algorithms. © 2023 Zhejiang University. All rights reserved.
引用
收藏
页码:1955 / 1965
页数:10
相关论文
共 39 条
  • [1] MA Y, SHUAI H, CHENG W., Spatiotemporal dilated convolution with uncertain matching for video-based crowd estimation [J], IEEE Transactions on Multimedia, 24, pp. 261-273, (2022)
  • [2] LI Meng, SUN Yan-ge, GUO Hua-ping, Et al., Multi-level fusion and attention mechanism based crowd counting algorithm [J], Journal of Jilin University: Information Science Edition, 40, 6, pp. 1009-1016, (2022)
  • [3] LIAN D, CHEN X, LI J, Et al., Locating and counting heads in crowds with a depth prior [J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 12, pp. 9056-9072, (2022)
  • [4] LU H, CAO Z, XIAO Y, Et al., TasselNet: counting maize tassels in the wild via local counts regression network [J], Plant Methods, 13, 1, pp. 1-17, (2017)
  • [5] XIE W, NOBLE J A, ZISSERMAN A., Microscopy cell counting and detection with fully convolutional regression networks [J], Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 6, 3, pp. 283-292, (2018)
  • [6] LIANG M, HUANG X, CHEN C, Et al., Counting and classification of highway vehicles by regression analysis [J], IEEE Transactions on Intelligent Transportation Systems, 16, pp. 2878-2888, (2015)
  • [7] ZENG C, MA H., Robust Head-shoulder detection by PCA-based multilevel HOG-LBP detector for people counting [C], 20th International Conference on Pattern Recognition, pp. 2069-2072, (2010)
  • [8] LI M, ZHANG Z, HUANG K, Et al., Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection [C], 19th International Conference on Pattern Recognition, pp. 1-4, (2008)
  • [9] LIU Di, GUO Ji-chang, WANG Yu-dong, Et al., Multi-scale salient object detection network combining an attention mechanism [J], Journal of Xidian University: Natural Science, 49, 4, pp. 118-126, (2022)
  • [10] LEMPITSKY V, ZISSERMAN A., Learning to count objects in images [J], Advances in Neural Information Processing Systems, 23, pp. 1324-1332, (2010)