Efficient multi-level cross-modal fusion and detection network for infrared and visible image

被引:0
|
作者
Gao, Hongwei [1 ,2 ]
Wang, Yutong [1 ]
Sun, Jian [1 ]
Jiang, Yueqiu [1 ]
Gai, Yonggang [1 ]
Yu, Jiahui [3 ,4 ]
机构
[1] Shenyang Ligong Univ, Sch Automat & Elect Engn, Shenyang 110159, Peoples R China
[2] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang 110016, Peoples R China
[3] Zhejiang Univ, Dept Biomed Engn, Hangzhou 310027, Peoples R China
[4] Binjiang Inst Zhejiang Univ, Innovat Ctr Smart Med Technol & Devices, Hangzhou 310053, Peoples R China
关键词
Uncrewed aerial vehicles; Aerial image; Image fusion; Object detection;
D O I
10.1016/j.aej.2024.07.107
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the rapid development of uncrewed aerial vehicle (UAV) technology, detecting aerial images has found significant applications across various domains. However, existing algorithms overlook the impact of illumination on target detection, resulting in less satisfactory detection performance under low-light conditions. We propose EfficientFuseDet, a visible and infrared image fusion detection network to overcome this issue. First, an effective multilevel cross-modal fusion network called EfficientFuse is presented to combine complementary information from both modalities better. EfficientFuse captures local dependencies and global contextual information in shallow and deep layers, seamlessly combining complimentary local and global features throughout the network. The generated fused images can exhibit clear target contours and abundant texture information. Second, we propose a detection network called AFI-YOLO, which employs an inverted residual vision transformer backbone (IRViT) to effectively address the challenges associated with background interference in fused images. We design an efficient feature pyramid network (EFPN) that efficiently integrates multiscale information, enhancing multiscale detection capability using aerial images. A reparameterization decoupling head (RepHead) is proposed to further improve target classification and localization precision. Finally, experiments on the DroneVehicle dataset indicate that the detection accuracy using fused images can reach 47.2%, which is higher than that observed with visible light images of 45 %. Compared to state-of-the-art detection algorithms, EfficientFuseDet exhibits a slight decrease in speed. However, it demonstrates superior detection capabilities and effectively enhances the detection accuracy using aerial images under low-light conditions.
引用
收藏
页码:306 / 318
页数:13
相关论文
共 50 条
  • [1] Cross-Modal Transformers for Infrared and Visible Image Fusion
    Park, Seonghyun
    Vien, An Gia
    Lee, Chul
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 770 - 785
  • [2] CCAFusion: Cross-Modal Coordinate Attention Network for Infrared and Visible Image Fusion
    Li, Xiaoling
    Li, Yanfeng
    Chen, Houjin
    Peng, Yahui
    Pan, Pan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 866 - 881
  • [3] Multi-Level Cross-Modal Alignment for Image Clustering
    Qiu, Liping
    Zhang, Qin
    Chen, Xiaojun
    Cai, Shaotian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14695 - 14703
  • [4] Infrared and visible image fusion based on cross-modal extraction strategy
    Liu, Xiaowen
    Li, Jing
    Yang, Xin
    Huo, Hongtao
    INFRARED PHYSICS & TECHNOLOGY, 2022, 124
  • [5] Metaphor recognition based on cross-modal multi-level information fusion
    Qimeng Yang
    Yuanbo Yan
    Xiaoyu He
    Shisong Guo
    Complex & Intelligent Systems, 2025, 11 (1)
  • [6] Multi-level optimal fusion algorithm for infrared and visible image
    Jian, Bo-Lin
    Tu, Ching-Che
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (08) : 4209 - 4217
  • [7] Multi-level optimal fusion algorithm for infrared and visible image
    Bo-Lin Jian
    Ching-Che Tu
    Signal, Image and Video Processing, 2023, 17 : 4209 - 4217
  • [8] CMFA_Net: A cross-modal feature aggregation network for infrared-visible image fusion
    Ding, Zhaisheng
    Li, Haiyan
    Zhou, Dongming
    Li, Hongsong
    Liu, Yanyu
    Hou, Ruichao
    INFRARED PHYSICS & TECHNOLOGY, 2021, 118
  • [9] BCMFIFuse: A Bilateral Cross-Modal Feature Interaction-Based Network for Infrared and Visible Image Fusion
    Gao, Xueyan
    Liu, Shiguang
    REMOTE SENSING, 2024, 16 (17)
  • [10] CEFusion: An Infrared and Visible Image Fusion Network Based on Cross-Modal Multi-Granularity Information Interaction and Edge Guidance
    Yang, Bin
    Hu, Yuxuan
    Liu, Xiaowen
    Li, Jing
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 17794 - 17809