MatchFormer: Interleaving Attention in Transformers for Feature Matching

被引:23
|
作者
Wang, Qing [1 ]
Zhang, Jiaming [1 ]
Yang, Kailun [1 ]
Peng, Kunyu [1 ]
Stiefelhagen, Rainer [1 ]
机构
[1] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
关键词
Feature matching; Vision transformers;
D O I
10.1007/978-3-031-26313-2_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).
引用
收藏
页码:256 / 273
页数:18
相关论文
共 50 条
  • [41] Feature correspondence by interleaving shape and texture computations
    Beymer, D
    1996 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1996, : 921 - 928
  • [42] Vision Transformers with Hierarchical Attention
    Liu, Yun
    Wu, Yu-Huan
    Sun, Guolei
    Zhang, Le
    Chhatkuli, Ajad
    Van Gool, Luc
    MACHINE INTELLIGENCE RESEARCH, 2024, 21 (04) : 670 - 683
  • [43] Pro-Attention: Efficient Probability Distribution Matching-Based Attention Through Feature Space Conversion
    Bae, Jongseong
    Cheon, Byung Do
    Kim, Ha Young
    IEEE ACCESS, 2022, 10 : 131192 - 131201
  • [44] Quantifying Attention Flow in Transformers
    Abnar, Samira
    Zuidema, Willem
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 4190 - 4197
  • [45] Predicting Attention Sparsity in Transformers
    Treviso, Marcos
    Gois, Antonio
    Fernandes, Patrick
    Fonseca, Erick
    Martins, Andre F. T.
    PROCEEDINGS OF THE SIXTH WORKSHOP ON STRUCTURED PREDICTION FOR NLP (SPNLP 2022), 2022, : 67 - 81
  • [46] Constituent Attention for Vision Transformers
    Li, Haoling
    Xue, Mengqi
    Song, Jie
    Zhang, Haofei
    Huang, Wenqi
    Liang, Lingyu
    Song, Mingli
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 237
  • [47] Adaptive Attention Span in Transformers
    Sukhbaatar, Sainbayar
    Grave, Edouard
    Bojanowski, Piotr
    Joulin, Armand
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 331 - 335
  • [48] LGI-GT: Graph Transformers with Local and Global Operators Interleaving
    Yin, Shuo
    Zhong, Guoqiang
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4504 - 4512
  • [49] Improving Planar Transformers for LLC Resonant Converters: Paired Layers Interleaving
    Saket, Mohammad Ali
    Ordonez, Martin
    Craciun, Marian
    Botting, Chris
    IEEE TRANSACTIONS ON POWER ELECTRONICS, 2019, 34 (12) : 11813 - 11832
  • [50] Ranked Time Series Matching by Interleaving Similarity Distances
    Cuong Nguyen
    Lovering, Charles
    Neamtu, Rodica
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3530 - 3539