MatchFormer: Interleaving Attention in Transformers for Feature Matching

被引：23

作者：

Wang, Qing ^{[1
]}

Zhang, Jiaming ^{[1
]}

Yang, Kailun ^{[1
]}

Peng, Kunyu ^{[1
]}

Stiefelhagen, Rainer ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Karlsruhe, Germany

来源：

COMPUTER VISION - ACCV 2022, PT III | 2023年 / 13843卷

关键词：

Feature matching; Vision transformers;

D O I：

10.1007/978-3-031-26313-2_16

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).

引用

页码：256 / 273

页数：18

共 50 条

[1] SAM-Net: Self-Attention based Feature Matching with Spatial Transformers and Knowledge Distillation
Kelenyi, Benjamin
Domsa, Victor
Tamas, Levente
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242
[2] Learning Geometric Feature Embedding with Transformers for Image Matching
Nan, Xiaohu
Ding, Lei
SENSORS, 2022, 22 (24)
[3] Interleaving of electrodes in piezoelectric transformers
Sanz, A
Alou, P
Oliver, JA
Prieto, R
Cobos, JA
Uceda, J
PESC'02: 2002 IEEE 33RD ANNUAL POWER ELECTRONICS SPECIALISTS CONFERENCE, VOLS 1-4, CONFERENCE PROCEEDINGS, 2002, : 567 - 572
[4] LoFTR: Detector-Free Local Feature Matching with Transformers
Sun, Jiaming
Shen, Zehong
Wang, Yuang
Bao, Hujun
Zhou, Xiaowei
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8918 - 8927
[5] ResMatch: Residual Attention Learning for Feature Matching
Deng, Yuxin
Zhang, Kaining
Zhang, Shihua
Li, Yansheng
Ma, Jiayi
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1501 - 1509
[6] Backtracking, interleaving, and terminating monad transformers
Kiselyov, O
Shan, CC
Friedman, DP
ACM SIGPLAN NOTICES, 2005, 40 (09) : 192 - 203
[7] FmCFA: a feature matching method for critical feature attention in multimodal images
Liao, Yun
Wu, Xuning
Liu, Junhui
Liu, Peiyu
Pan, Zhixuan
Duan, Qing
SCIENTIFIC REPORTS, 2025, 15 (01):
[8] ParaFormer: Parallel Attention Transformer for Efficient Feature Matching
Lu, Xiaoyong
Yan, Yaping
Kang, Bin
Du, Songlin
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1853 - 1860
[9] Simultaneous Deep Stereo Matching and Dehazing with Feature Attention
Taeyong Song
Youngjung Kim
Changjae Oh
Hyunsung Jang
Namkoo Ha
Kwanghoon Sohn
International Journal of Computer Vision, 2020, 128 : 799 - 817
[10] Simultaneous Deep Stereo Matching and Dehazing with Feature Attention
Song, Taeyong
Kim, Youngjung
Oh, Changjae
Jang, Hyunsung
Ha, Namkoo
Sohn, Kwanghoon
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (04) : 799 - 817

← 1 2 3 4 5 →