MatchFormer: Interleaving Attention in Transformers for Feature Matching

被引:23
|
作者
Wang, Qing [1 ]
Zhang, Jiaming [1 ]
Yang, Kailun [1 ]
Peng, Kunyu [1 ]
Stiefelhagen, Rainer [1 ]
机构
[1] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
关键词
Feature matching; Vision transformers;
D O I
10.1007/978-3-031-26313-2_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).
引用
收藏
页码:256 / 273
页数:18
相关论文
共 50 条
  • [1] SAM-Net: Self-Attention based Feature Matching with Spatial Transformers and Knowledge Distillation
    Kelenyi, Benjamin
    Domsa, Victor
    Tamas, Levente
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242
  • [2] Learning Geometric Feature Embedding with Transformers for Image Matching
    Nan, Xiaohu
    Ding, Lei
    SENSORS, 2022, 22 (24)
  • [3] Interleaving of electrodes in piezoelectric transformers
    Sanz, A
    Alou, P
    Oliver, JA
    Prieto, R
    Cobos, JA
    Uceda, J
    PESC'02: 2002 IEEE 33RD ANNUAL POWER ELECTRONICS SPECIALISTS CONFERENCE, VOLS 1-4, CONFERENCE PROCEEDINGS, 2002, : 567 - 572
  • [4] LoFTR: Detector-Free Local Feature Matching with Transformers
    Sun, Jiaming
    Shen, Zehong
    Wang, Yuang
    Bao, Hujun
    Zhou, Xiaowei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8918 - 8927
  • [5] ResMatch: Residual Attention Learning for Feature Matching
    Deng, Yuxin
    Zhang, Kaining
    Zhang, Shihua
    Li, Yansheng
    Ma, Jiayi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1501 - 1509
  • [6] Backtracking, interleaving, and terminating monad transformers
    Kiselyov, O
    Shan, CC
    Friedman, DP
    ACM SIGPLAN NOTICES, 2005, 40 (09) : 192 - 203
  • [7] FmCFA: a feature matching method for critical feature attention in multimodal images
    Liao, Yun
    Wu, Xuning
    Liu, Junhui
    Liu, Peiyu
    Pan, Zhixuan
    Duan, Qing
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [8] ParaFormer: Parallel Attention Transformer for Efficient Feature Matching
    Lu, Xiaoyong
    Yan, Yaping
    Kang, Bin
    Du, Songlin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1853 - 1860
  • [9] Simultaneous Deep Stereo Matching and Dehazing with Feature Attention
    Taeyong Song
    Youngjung Kim
    Changjae Oh
    Hyunsung Jang
    Namkoo Ha
    Kwanghoon Sohn
    International Journal of Computer Vision, 2020, 128 : 799 - 817
  • [10] Simultaneous Deep Stereo Matching and Dehazing with Feature Attention
    Song, Taeyong
    Kim, Youngjung
    Oh, Changjae
    Jang, Hyunsung
    Ha, Namkoo
    Sohn, Kwanghoon
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (04) : 799 - 817