Efficient cross-modality feature interaction for multispectral armored vehicle detection

被引:0
|
作者
Zhang, Jie [1 ]
Chang, Tian-qing [1 ]
Zhao, Li-yang [2 ]
Ma, Jin-dun [3 ]
Han, Bin [1 ]
Zhang, Lei [1 ]
机构
[1] Army Acad Armored Forces, Beijing 100072, Peoples R China
[2] PLA Naval Submarine Acad, Qingdao 266199, Peoples R China
[3] PLA, Unit 63966, Beijing 100072, Peoples R China
关键词
Cross-modality; Armored vehicle detection; Feature interaction; Multispectral; RECOGNITION;
D O I
10.1016/j.asoc.2024.111971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting armed vehicles from a UAV platform is challenging due to the complexity of ground environment. This paper presents a dual-stream multispectral armored vehicle detection method to tackle this problem. First, considering that there is a paucity of datasets containing multispectral armored vehicle images, a multispectral armored vehicle detection dataset is constructed for this study. The dataset consists of 5853 pairs of RGB and infrared images, featuring a total of 15,878 instances of armored vehicles. Then, a cross-modal feature interaction module is designed to enable efficient feature interaction between multispectral images. This module uses the cross-modal channel-wise feature difference method to model the channel differences between the two modal features and obtains the cross-modal channel difference matrix. The cross-modal channel difference matrix is then employed to extract the unique features of the two modal features, allowing for efficient cross-modal feature interaction by complementing each other's unique features. Experiment results demonstrate that the proposed model has excellent detection performance and is capable of coping with various challenges brought by complex ground environments.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences
    Jing, Longlong
    Zhang, Ling
    Tian, Yingli
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1581 - 1591
  • [42] DEEP ACTIVE LEARNING FROM MULTISPECTRAL DATA THROUGH CROSS-MODALITY PREDICTION INCONSISTENCY
    Zhang, Heng
    Fromont, Elisa
    Lefevre, Sebastien
    Avignon, Bruno
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 449 - 453
  • [43] Multimodal Pedestrian Detection Based on Cross-Modality Reference Search
    Lee, Wei-Yu
    Jovanov, Ljubomir
    Philips, Wilfried
    IEEE SENSORS JOURNAL, 2024, 24 (10) : 17291 - 17306
  • [44] Global and Part Feature Fusion for Cross-Modality Person Re-Identification
    Wang, Xianju
    Cordova, Ronald S.
    IEEE ACCESS, 2022, 10 : 122038 - 122046
  • [45] Efficient Web Video Classification via Cross-modality Knowledge Transferring
    Xia, Shijun
    Li, Tianyu
    Ge, Shengbin
    Dong, Zhengya
    8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 211 - 216
  • [46] Cross-Modality Feature Learning Through Generic Hierarchical Hyperlingual-Words
    Shao, Ming
    Fu, Yun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (02) : 451 - 463
  • [47] Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection
    Xie, Zhengxuan
    Shao, Feng
    Chen, Gang
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4149 - 4163
  • [48] Unbiased feature enhancement framework for cross-modality person re-identification
    Yuan, Bowen
    Chen, Bairu
    Tan, Zhiyi
    Shao, Xi
    Bao, Bing-Kun
    MULTIMEDIA SYSTEMS, 2022, 28 (03) : 749 - 759
  • [49] Unbiased feature enhancement framework for cross-modality person re-identification
    Bowen Yuan
    Bairu Chen
    Zhiyi Tan
    Xi Shao
    Bing-Kun Bao
    Multimedia Systems, 2022, 28 : 749 - 759
  • [50] HPILN: a feature learning framework for cross-modality person re-identification
    Zhao, Yun-Bo
    Lin, Jian-Wu
    Xuan, Qi
    Xi, Xugang
    IET IMAGE PROCESSING, 2019, 13 (14) : 2897 - 2904