Efficient multi-level cross-modal fusion and detection network for infrared and visible image

被引:0
|
作者
Gao, Hongwei [1 ,2 ]
Wang, Yutong [1 ]
Sun, Jian [1 ]
Jiang, Yueqiu [1 ]
Gai, Yonggang [1 ]
Yu, Jiahui [3 ,4 ]
机构
[1] Shenyang Ligong Univ, Sch Automat & Elect Engn, Shenyang 110159, Peoples R China
[2] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang 110016, Peoples R China
[3] Zhejiang Univ, Dept Biomed Engn, Hangzhou 310027, Peoples R China
[4] Binjiang Inst Zhejiang Univ, Innovat Ctr Smart Med Technol & Devices, Hangzhou 310053, Peoples R China
关键词
Uncrewed aerial vehicles; Aerial image; Image fusion; Object detection;
D O I
10.1016/j.aej.2024.07.107
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the rapid development of uncrewed aerial vehicle (UAV) technology, detecting aerial images has found significant applications across various domains. However, existing algorithms overlook the impact of illumination on target detection, resulting in less satisfactory detection performance under low-light conditions. We propose EfficientFuseDet, a visible and infrared image fusion detection network to overcome this issue. First, an effective multilevel cross-modal fusion network called EfficientFuse is presented to combine complementary information from both modalities better. EfficientFuse captures local dependencies and global contextual information in shallow and deep layers, seamlessly combining complimentary local and global features throughout the network. The generated fused images can exhibit clear target contours and abundant texture information. Second, we propose a detection network called AFI-YOLO, which employs an inverted residual vision transformer backbone (IRViT) to effectively address the challenges associated with background interference in fused images. We design an efficient feature pyramid network (EFPN) that efficiently integrates multiscale information, enhancing multiscale detection capability using aerial images. A reparameterization decoupling head (RepHead) is proposed to further improve target classification and localization precision. Finally, experiments on the DroneVehicle dataset indicate that the detection accuracy using fused images can reach 47.2%, which is higher than that observed with visible light images of 45 %. Compared to state-of-the-art detection algorithms, EfficientFuseDet exhibits a slight decrease in speed. However, it demonstrates superior detection capabilities and effectively enhances the detection accuracy using aerial images under low-light conditions.
引用
收藏
页码:306 / 318
页数:13
相关论文
共 50 条
  • [21] Cross-modal and multi-level feature refinement network for RGB-D salient object detection
    Yue Gao
    Meng Dai
    Qing Zhang
    The Visual Computer, 2023, 39 : 3979 - 3994
  • [22] Visible and infrared image fusion based on multi-level method and image contrast improvement
    Peng, Yiyue
    He, Weiji
    Gu, Guohua
    Tong, Tao
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2013, 42 (04): : 1095 - 1099
  • [23] CMFuse: Cross-Modal Features Mixing via Convolution and MLP for Infrared and Visible Image Fusion
    Cai, Zhao
    Ma, Yong
    Huang, Jun
    Mei, Xiaoguang
    Fan, Fan
    Zhao, Zhiqing
    IEEE SENSORS JOURNAL, 2024, 24 (15) : 24152 - 24167
  • [24] Multi-level adaptive perception guidance based infrared and visible image fusion
    Xing, Mengliang
    Liu, Gang
    Tang, Haojie
    Qian, Yao
    Zhang, Jun
    OPTICS AND LASERS IN ENGINEERING, 2023, 171
  • [25] Multi-Level Cross-Modal Semantic Alignment Network for Video-Text Retrieval
    Nian, Fudong
    Ding, Ling
    Hu, Yuxia
    Gu, Yanhong
    MATHEMATICS, 2022, 10 (18)
  • [26] Deep Multi-Level Semantic Hashing for Cross-Modal Retrieval
    Ji, Zhenyan
    Yao, Weina
    Wei, Wei
    Song, Houbing
    Pi, Huaiyu
    IEEE ACCESS, 2019, 7 : 23667 - 23674
  • [27] Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval
    Ma, Xinhong
    Zhang, Tianzhu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3101 - 3114
  • [28] AGRFNet: Two-stage cross-modal and multi-level attention gated recurrent fusion network for RGB-D saliency detection
    Liu, Zhengyi
    Wang, Yuan
    Tan, Yacheng
    Li, Wei
    Xiao, Yun
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 104
  • [29] Infrared and Visible Cross-Modal Image Retrieval Through Shared Features
    Liu, Fangcen
    Gao, Chenqiang
    Sun, Yongqing
    Zhao, Yue
    Yang, Feng
    Qin, Anyong
    Meng, Deyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (11) : 4485 - 4496
  • [30] A novel infrared and visible image fusion method based on multi-level saliency integration
    Lu, Ruitao
    Gao, Fan
    Yang, Xiaogang
    Fan, Jiwei
    Li, Dalei
    VISUAL COMPUTER, 2023, 39 (06): : 2321 - 2335