Combining Semantics With Multi-level Feature Fusion for Pedestrian Detection

被引:0
|
作者
Chu J. [1 ]
Shu W. [1 ]
Zhou Z.-B. [1 ]
Miao J. [1 ]
Leng L. [1 ]
机构
[1] Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition (Nanchang Hangkong University), Nanchang
来源
基金
中国国家自然科学基金;
关键词
Feature fusion; Occlusion; Pedestrian detection; Secondary detection; Semantic segmentation;
D O I
10.16383/j.aas.c200032
中图分类号
学科分类号
摘要
Occlusion and similar objects in the background typically degrade the accuracy of pedestrian detection. To solve the above problems, this paper proposes a pedestrian detection algorithm that combines semantics with multi-level feature fusion (CSMFF). Firstly, multi-convolutional-layer features are fused, and semantic segmentation is added to the fusion layer. The obtained semantic features are connected to the corresponding convolutional layers as the prior information of the pedestrian target location, which enhances the discrimination between pedestrian and background. Based on the preliminary regression, a pedestrian secondary detection module (PSDM) is constructed to further eliminate false positives. The experimental results show that the miss rates (MR) of the proposed algorithm on the datasets Caltech and CityPersons are 7.06 % and 11.2 %, respectively. The algorithm has strong robustness to occluded pedestrians, and can be easily embedded into other detection frameworks. Copyright ©2022 Acta Automatica Sinica. All rights reserved.
引用
收藏
页码:282 / 291
页数:9
相关论文
共 37 条
  • [1] Danelljan M, Bhat G, Khan F S, Felsberg M., Atom: Accurate tracking by overlap maximization, Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4660-4669, (2019)
  • [2] Li You-Jiao, Zhuo Li, Zhang jing, Li Jia-Feng, Zhang Hui, Overview of Pedestrian Re-identification Technology, Acta Automatica Sinica, 44, 9, pp. 1554-1568, (2018)
  • [3] Geiger A, Lenz P, Urtasun R., Are we ready for autonomous driving? The KITTI vision benchmark suite, Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354-3361, (2012)
  • [4] Wang Meng-Lai, Li Xiang, Chen Qi, Li Yuan-Bo, Zhao Yan-Yun, CNN-based surveillance video event detection, Acta Automatica Sinica, 42, 6, pp. 892-903, (2016)
  • [5] Kanazawa A, Black M J, Jacobs D W, Malik J., End-to-end recovery of human shape and pose, Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122-7131, (2018)
  • [6] Zhang S, Benenson R, Omran M, Hosang J, Schiele B., How far are we from solving pedestrian detection?, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1259-1267, (2016)
  • [7] Girshick R., Fast R-CNN, Proceedings of the 2015 IEEE International Conference on Computer Vision, pp. 1440-1448, (2015)
  • [8] Ren S, He K, Girshick R, Sun J., Faster R-CNN: Towards real-time object detection with region proposal networks, Proceedings of the 2015 Advances in Neural Information Processing Systems (NIPS), pp. 91-99, (2015)
  • [9] Yang F, Choi W, Lin Y., Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129-2137, (2016)
  • [10] Cai Z, Fan Q, Feris R S, Vasconcelos N., A unified multi-scale deep convolutional neural network for fast object detection, Proceedings of the 2016 European Conference on Computer Vision, pp. 354-370, (2016)