Attention-based multimodal image matching

被引:0
|
作者
Moreshet, Aviad [1 ]
Keller, Yosi [1 ]
机构
[1] Bar Ilan Univ, Fac Engn, Ramat Gan, Israel
关键词
Multisensor image matching; Deep learning; Attention-based; REGISTRATION;
D O I
10.1016/j.cviu.2024.103949
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a method for matching multimodal image patches using a multiscale Transformer-Encoder that focuses on the feature maps of a Siamese CNN. It effectively combines multiscale image embeddings while improving task-specific and appearance-invariant image cues. We also introduce a residual attention architecture that allows for end-to-end training by using a residual connection. To the best of our knowledge, this is the first successful use of the Transformer-Encoder architecture in multimodal image matching. We motivate the use of task-specific multimodal descriptors by achieving new state-of-the-art accuracy on both multimodal and unimodal benchmarks, and demonstrate the quantitative and qualitative advantages of our approach over state-of-the-art unimodal image matching methods in multimodal matching. Our code is shared here: Code.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A Hierarchical Multimodal Attention-based Neural Network for Image Captioning
    Cheng, Yong
    Huang, Fei
    Zhou, Lian
    Jin, Cheng
    Zhang, Yuejie
    Zhang, Tao
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 889 - 892
  • [2] Multimodal Brain Image Segmentation and Analysis with Neuromorphic Attention-Based Learning
    Han, Woo-Sup
    Han, Il Song
    [J]. BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES (BRAINLES 2019), PT II, 2020, 11993 : 14 - 26
  • [3] Attention-Based Multimodal Image Feature Fusion Module for Transmission Line Detection
    Choi, Hyeyeon
    Yun, Jong Pil
    Kim, Bum Jun
    Jang, Hyeonah
    Kim, Sang Woo
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (11) : 7686 - 7695
  • [4] Dynamic attention-based detector and descriptor with effective and derivable loss for image matching
    Yang, Hua
    Jiang, Yuyang
    Huang, Kaiji
    Yin, Zhouping
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (02)
  • [5] Multimodal attention-based transformer for video captioning
    Hemalatha Munusamy
    Chandra Sekhar C
    [J]. Applied Intelligence, 2023, 53 : 23349 - 23368
  • [6] Multimodal attention-based transformer for video captioning
    Munusamy, Hemalatha
    Sekhar, C. Chandra
    [J]. APPLIED INTELLIGENCE, 2023, 53 (20) : 23349 - 23368
  • [7] Attention-Based Multimodal Fusion for Video Description
    Hori, Chiori
    Hori, Takaaki
    Lee, Teng-Yok
    Zhang, Ziming
    Harsham, Bret
    Hershey, John R.
    Marks, Tim K.
    Sumi, Kazuhiko
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4203 - 4212
  • [8] GRPAFusion: A Gradient Residual and Pyramid Attention-Based Multiscale Network for Multimodal Image Fusion
    Wang, Jinxin
    Xi, Xiaoli
    Li, Dongmei
    Li, Fang
    Zhang, Guanxin
    [J]. ENTROPY, 2023, 25 (01)
  • [9] AMMUNIT: An Attention-Based Multimodal Multi-domain UNsupervised Image-to-Image Translation Framework
    Luo, Lei
    Hsu, William H.
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 358 - 370
  • [10] Attention-Based Real Image Restoration
    Anwar, Saeed
    Barnes, Nick
    Petersson, Lars
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021,