Cascaded information enhancement and cross-modal attention feature fusion for multispectral pedestrian detection

被引:3
|
作者
Yang, Yang [1 ]
Xu, Kaixiong [1 ]
Wang, Kaizheng [2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
[2] Kunming Univ Sci & Technol, Fac Elect Engn, Kunming, Peoples R China
基金
中国国家自然科学基金;
关键词
multispectral pedestrian detection; attention mechanism; feature fusion; convolutional neural network; background noise; IMAGE FUSION; NETWORK;
D O I
10.3389/fphy.2023.1121311
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Multispectral pedestrian detection is a technology designed to detect and locate pedestrians in Color and Thermal images, which has been widely used in automatic driving, video surveillance, etc. So far most available multispectral pedestrian detection algorithms only achieved limited success in pedestrian detection because of the lacking take into account the confusion of pedestrian information and background noise in Color and Thermal images. Here we propose a multispectral pedestrian detection algorithm, which mainly consists of a cascaded information enhancement module and a cross-modal attention feature fusion module. On the one hand, the cascaded information enhancement module adopts the channel and spatial attention mechanism to perform attention weighting on the features fused by the cascaded feature fusion block. Moreover, it multiplies the single-modal features with the attention weight element by element to enhance the pedestrian features in the single-modal and thus suppress the interference from the background. On the other hand, the cross-modal attention feature fusion module mines the features of both Color and Thermal modalities to complement each other, then the global features are constructed by adding the cross-modal complemented features element by element, which are attentionally weighted to achieve the effective fusion of the two modal features. Finally, the fused features are input into the detection head to detect and locate pedestrians. Extensive experiments have been performed on two improved versions of annotations (sanitized annotations and paired annotations) of the public dataset KAIST. The experimental results show that our method demonstrates a lower pedestrian miss rate and more accurate pedestrian detection boxes compared to the comparison method. Additionally, the ablation experiment also proved the effectiveness of each module designed in this paper.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Multiscale Cross-Modal Homogeneity Enhancement and Confidence-Aware Fusion for Multispectral Pedestrian Detection
    Li, Ruimin
    Xiang, Jiajun
    Sun, Feixiang
    Yuan, Ye
    Yuan, Longwu
    Gou, Shuiping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 852 - 863
  • [2] Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection
    Bao, Wei
    Hu, Jingjing
    Huang, Meiyu
    Xiang, Xueshuang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 499 - 510
  • [3] Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection
    Cao, Yanpeng
    Luo, Xing
    Yang, Jiangxin
    Cao, Yanlong
    Yang, Michael Ying
    INFORMATION FUSION, 2022, 88 : 1 - 11
  • [4] Vulnerability detection through cross-modal feature enhancement and fusion
    Tao, Wenxin
    Su, Xiaohong
    Wan, Jiayuan
    Wei, Hongwei
    Zheng, Weining
    COMPUTERS & SECURITY, 2023, 132
  • [5] Lightweight Cross-Modal Multispectral Pedestrian Detection Based on Spatial Reweighted Attention Mechanism
    Deng, Lujuan
    Fu, Ruochong
    Li, Zuhe
    Liu, Boyi
    Xue, Mengze
    Cui, Yuhao
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (03): : 4071 - 4089
  • [6] Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection
    Xie, Jin
    Anwer, Rao Muhammad
    Cholakkal, Hisham
    Nie, Jing
    Cao, Jiale
    Laaksonen, Jorma
    Khan, Fahad Shahbaz
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4043 - 4052
  • [7] Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection
    Zhang, Lu
    Zhu, Xiangyu
    Chen, Xiangyu
    Yang, Xu
    Lei, Zhen
    Liu, Zhiyong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5126 - 5136
  • [8] Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection
    Kim, Jung Uk
    Park, Sungjune
    Ro, Yong Man
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1510 - 1523
  • [9] Cross-modal information fusion for voice spoofing detection
    Xue, Junxiao
    Zhou, Hao
    Song, Huawei
    Wu, Bin
    Shi, Lei
    SPEECH COMMUNICATION, 2023, 147 : 41 - 50
  • [10] GCANet: A Cross-Modal Pedestrian Detection Method Based on Gaussian Cross Attention Network
    Peng, Peiran
    Mu, Feng
    Yan, Peilin
    Song, Liqiang
    Li, Hui
    Chen, Yu
    Li, Jianan
    Xu, Tingfa
    INTELLIGENT COMPUTING, VOL 2, 2022, 507 : 520 - 530