SiamMMF: multi-modal multi-level fusion object tracking based on Siamese networks

被引:4
|
作者
Yang, Zhen [1 ,2 ]
Huang, Peng [1 ]
He, Dunyun [1 ]
Cai, Zhongwang [1 ]
Yin, Zhijian [1 ]
机构
[1] Jiangxi Sci & Technol Normal Univ, Sch Commun & Elect, Nanchang 330013, Peoples R China
[2] Guangdong ATV Acad Performing Arts, Dongguan 523710, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal object tracking; Pixel-level fusion; Feature-level fusion; Dual-stream network; VISIBLE IMAGE FUSION; INFRARED IMAGES;
D O I
10.1007/s00138-022-01354-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature-level or pixel-level fusion is a common technique for integrating different modes of information in RGB-T object tracking. A good fusion method between modalities can significantly improve the tracking performance. In this paper, a multi-modal and multi-level fusion model based on Siamese network (SiamMMF) is proposed. SiamMMF consists of two main subnetworks: a pixel-level fusion network and a feature-level fusion network. The pixel-level fusion network fuses the infrared images and the visible light images by taking the maximum values of the pixels corresponding to the different images, and the combined images are used to replace the visible light images. The infrared images and the visible light images are each input to the backbone with dual-stream structure for processing. After the extraction of deep features, the visible and infrared features from the two branches are cross-correlated to obtain a fusion result that is sent to the tracking head for tracking. Based on numerous experiments, it was found that the best tracking effect is obtained when the weighting ratio between the visible and infrared modality is set to 6:4. Nineteen pairs of RGB-T video sequences with different attributes were used to test our model and compared it with 15 trackers. For the two evaluation criteria, success rate and precision rate, our network achieved the best results.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] SiamMMF: multi-modal multi-level fusion object tracking based on Siamese networks
    Zhen Yang
    Peng Huang
    Dunyun He
    Zhongwang Cai
    Zhijian Yin
    [J]. Machine Vision and Applications, 2023, 34
  • [2] Multi-Modal Fusion Object Tracking Based on Fully Convolutional Siamese Network
    Qi, Ke
    Chen, Liji
    Zhou, Yicong
    Qi, Yutao
    [J]. 2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 440 - 444
  • [3] MMF-Track: Multi-Modal Multi-Level Fusion for 3D Single Object Tracking
    Li, Zhiheng
    Cui, Yubo
    Lin, Yu
    Fang, Zheng
    [J]. IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1817 - 1829
  • [4] Multi-level and Multi-modal Target Detection Based on Feature Fusion
    Cheng T.
    Sun L.
    Hou D.
    Shi Q.
    Zhang J.
    Chen J.
    Huang H.
    [J]. Qiche Gongcheng/Automotive Engineering, 2021, 43 (11): : 1602 - 1610
  • [5] Multi-Modal fusion with multi-level attention for Visual Dialog
    Zhang, Jingping
    Wang, Qiang
    Han, Yahong
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (04)
  • [6] Extremely Tiny Siamese Networks with Multi-level Fusions for Visual Object Tracking
    Cao, Yi
    Ji, Hongbing
    Zhang, Wenbo
    Shirani, Shahram
    [J]. 2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
  • [7] Multi-Modal Sensor Fusion and Object Tracking for Autonomous Racing
    Karle, Phillip
    Fent, Felix
    Huch, Sebastian
    Sauerbeck, Florian
    Lienkamp, Markus
    [J]. IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (07): : 3871 - 3883
  • [8] Multi-level Deep Correlative Networks for Multi-modal Sentiment Analysis
    CAI Guoyong
    LYU Guangrui
    LIN Yuming
    WEN Yimin
    [J]. Chinese Journal of Electronics, 2020, 29 (06) : 1025 - 1038
  • [9] Multi-modal brain image fusion based on multi-level edge-preserving filtering
    Tan, Wei
    Thiton, William
    Xiang, Pei
    Zhou, Huixin
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 64
  • [10] Multi-level Fusion of Multi-modal Semantic Embeddings for Zero Shot Learning
    Kong, Zhe
    Wang, Xin
    Gao, Neng
    Zhang, Yifei
    Liu, Yuhan
    Tu, Chenyang
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 310 - 318