MatchFormer: Interleaving Attention in Transformers for Feature Matching

被引:23
|
作者
Wang, Qing [1 ]
Zhang, Jiaming [1 ]
Yang, Kailun [1 ]
Peng, Kunyu [1 ]
Stiefelhagen, Rainer [1 ]
机构
[1] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
关键词
Feature matching; Vision transformers;
D O I
10.1007/978-3-031-26313-2_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Local feature matching is a computationally intensive task at the subpixel level. While detector-based methods coupled with feature descriptors struggle in low-texture scenes, CNN-based methods with a sequential extract-to-match pipeline, fail to make use of the matching capacity of the encoder and tend to overburden the decoder for matching. In contrast, we propose a novel hierarchical extract-and-match transformer, termed as MatchFormer. Inside each stage of the hierarchical encoder, we interleave self-attention for feature extraction and cross-attention for feature matching, yielding a human-intuitive extract-and-match scheme. Such a match-aware encoder releases the overloaded decoder and makes the model highly efficient. Further, combining self- and cross-attention on multi-scale features in a hierarchical architecture improves matching robustness, particularly in low-texture indoor scenes or with less outdoor training data. Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision. Compared to the previous best method in indoor pose estimation, our lite MatchFormer has only 45% GFLOPs, yet achieves a +1.3% precision gain and a 41% running speed boost. The large MatchFormer reaches state-of-the-art on four different benchmarks, including indoor pose estimation (ScanNet), outdoor pose estimation (MegaDepth), homography estimation and image matching (HPatch), and visual localization (InLoc).
引用
收藏
页码:256 / 273
页数:18
相关论文
共 50 条
  • [31] DOUBLE SECTION MATCHING TRANSFORMERS
    FRENCH, GN
    FOOKS, EH
    IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, 1969, MT17 (09) : 719 - &
  • [32] MOVABLE MATCHING TRANSFORMERS IN RESONATORS
    ZAKHAROV, AM
    TELECOMMUNICATIONS AND RADIO ENGINEERING, 1977, 31-2 (05) : 64 - 66
  • [33] Effective entity matching with transformers
    Li, Yuliang
    Li, Jinfeng
    Suhara, Yoshi
    Doan, AnHai
    Tan, Wang-Chiew
    VLDB JOURNAL, 2023, 32 (06): : 1215 - 1235
  • [34] Effective entity matching with transformers
    Yuliang Li
    Jinfeng Li
    Yoshi Suhara
    AnHai Doan
    Wang-Chiew Tan
    The VLDB Journal, 2023, 32 : 1215 - 1235
  • [35] Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
    Katharopoulos, Angelos
    Vyas, Apoorv
    Pappas, Nikolaos
    Fleuret, Francois
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [36] AAM-ORB: affine attention module on ORB for conditioned feature matching
    Song, Shaojing
    Ai, Luxia
    Tang, Pan
    Miao, Zhiqing
    Gu, Yang
    Chai, Yu
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2351 - 2358
  • [37] Multi-Scale Attention and Structural Relation Graph for Local Feature Matching
    Nan, Xiaohu
    Ding, Lei
    IEEE ACCESS, 2022, 10 : 110603 - 110615
  • [38] Meta network attention-based feature matching for heterogeneous defect prediction
    Nevendra, Meetesh
    Singh, Pradeep
    AUTOMATED SOFTWARE ENGINEERING, 2025, 32 (01)
  • [39] AAM-ORB: affine attention module on ORB for conditioned feature matching
    Shaojing Song
    Luxia Ai
    Pan Tang
    Zhiqing Miao
    Yang Gu
    Yu Chai
    Signal, Image and Video Processing, 2023, 17 : 2351 - 2358
  • [40] DSAP: Dynamic Sparse Attention Perception Matcher for Accurate Local Feature Matching
    Dai, Kun
    Wang, Ke
    Xie, Tao
    Sun, Tao
    Zhang, Jinhang
    Kong, Qingjia
    Jiang, Zhiqiang
    Li, Ruifeng
    Zhao, Lijun
    Omar, Mohamed
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 16