Attention-based multimodal image matching

被引：0

作者：

Moreshet, Aviad ^{[1
]}

Keller, Yosi ^{[1
]}

机构：

[1] Bar Ilan Univ, Fac Engn, Ramat Gan, Israel

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2024年 / 241卷

关键词：

Multisensor image matching; Deep learning; Attention-based; REGISTRATION;

D O I：

10.1016/j.cviu.2024.103949

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a method for matching multimodal image patches using a multiscale Transformer-Encoder that focuses on the feature maps of a Siamese CNN. It effectively combines multiscale image embeddings while improving task-specific and appearance-invariant image cues. We also introduce a residual attention architecture that allows for end-to-end training by using a residual connection. To the best of our knowledge, this is the first successful use of the Transformer-Encoder architecture in multimodal image matching. We motivate the use of task-specific multimodal descriptors by achieving new state-of-the-art accuracy on both multimodal and unimodal benchmarks, and demonstrate the quantitative and qualitative advantages of our approach over state-of-the-art unimodal image matching methods in multimodal matching. Our code is shared here: Code.

引用

页数：10

共 50 条

[1] A Hierarchical Multimodal Attention-based Neural Network for Image Captioning
Cheng, Yong
Huang, Fei
Zhou, Lian
Jin, Cheng
Zhang, Yuejie
Zhang, Tao
[J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 889 - 892
[2] Multimodal Brain Image Segmentation and Analysis with Neuromorphic Attention-Based Learning
Han, Woo-Sup
Han, Il Song
[J]. BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES (BRAINLES 2019), PT II, 2020, 11993 : 14 - 26
[3] Attention-Based Multimodal Image Feature Fusion Module for Transmission Line Detection
Choi, Hyeyeon
Yun, Jong Pil
Kim, Bum Jun
Jang, Hyeonah
Kim, Sang Woo
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (11) : 7686 - 7695
[4] Dynamic attention-based detector and descriptor with effective and derivable loss for image matching
Yang, Hua
Jiang, Yuyang
Huang, Kaiji
Yin, Zhouping
[J]. JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (02)
[5] Multimodal attention-based transformer for video captioning
Hemalatha Munusamy
Chandra Sekhar C
[J]. Applied Intelligence, 2023, 53 : 23349 - 23368
[6] Multimodal attention-based transformer for video captioning
Munusamy, Hemalatha
Sekhar, C. Chandra
[J]. APPLIED INTELLIGENCE, 2023, 53 (20) : 23349 - 23368
[7] Attention-Based Multimodal Fusion for Video Description
Hori, Chiori
Hori, Takaaki
Lee, Teng-Yok
Zhang, Ziming
Harsham, Bret
Hershey, John R.
Marks, Tim K.
Sumi, Kazuhiko
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4203 - 4212
[8] GRPAFusion: A Gradient Residual and Pyramid Attention-Based Multiscale Network for Multimodal Image Fusion
Wang, Jinxin
Xi, Xiaoli
Li, Dongmei
Li, Fang
Zhang, Guanxin
[J]. ENTROPY, 2023, 25 (01)
[9] AMMUNIT: An Attention-Based Multimodal Multi-domain UNsupervised Image-to-Image Translation Framework
Luo, Lei
Hsu, William H.
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 358 - 370
[10] Attention-Based Real Image Restoration
Anwar, Saeed
Barnes, Nick
Petersson, Lars
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021,

← 1 2 3 4 5 →