Transformer With Linear-Window Attention for Feature Matching

被引:0
|
作者
Shen, Zhiwei [1 ,2 ]
Kong, Bin [1 ,3 ,4 ]
Dong, Xiaoyu [1 ,2 ]
机构
[1] Chinese Acad Sci, Hefei Inst Intelligent Machines, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei Inst Phys Sci, Hefei 230026, Peoples R China
[3] Anhui Engn Lab Intelligent Driving Technol & Appli, Hefei 230088, Peoples R China
[4] Chinese Acad Sci, Innovat Res Inst Robot & Intelligent Mfg Hefei, Hefei 230088, Peoples R China
关键词
Feature extraction; Transformers; Task analysis; Computational modeling; Computational efficiency; Memory management; Visualization; Feature matching; visual transformer; detector-free; computational complexity; low-texture;
D O I
10.1109/ACCESS.2023.3328855
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A transformer can capture long-term dependencies through an attention mechanism, and hence, can be applied to various vision tasks. However, its secondary computational complexity is a major obstacle in vision tasks that require accurate predictions. To address this limitation, this study introduces linear-window attention (LWA), a new attention model for a vision transformer. The transformer computes self-attention that is restricted to nonoverlapping local windows and represented as a linear dot product of kernel feature mappings. Furthermore, the computational complexity of each window is reduced to linear from quadratic using the constraint property of matrix products. In addition, we applied the LWA to feature matching to construct a coarse-to-fine-level detector-free feature matching method, called transformer with linear-window attention for feature matching TRLWAM. At the coarse level, we extracted the dense pixel-level matches, and at the fine level, we obtained the final matching results via multi-head multilayer perceptron refinement. We demonstrated the effectiveness of LWA through Replace experiments. The results showed that the TRLWAM could extract dense matches from low-texture or repetitive pattern regions in indoor environments, and exhibited excellent results with a low computational cost for MegaDepth and HPatches datasets. We believe the proposed LWA can provide new conceptions for transformer applications in visual tasks.
引用
收藏
页码:121202 / 121211
页数:10
相关论文
共 50 条
  • [21] Window-accumulated subsequence matching problem is linear
    Boasson, L
    Cegielski, P
    Guessarian, I
    Matiyasevich, Y
    ANNALS OF PURE AND APPLIED LOGIC, 2002, 113 (1-3) : 59 - 80
  • [22] Simultaneous Deep Stereo Matching and Dehazing with Feature Attention
    Taeyong Song
    Youngjung Kim
    Changjae Oh
    Hyunsung Jang
    Namkoo Ha
    Kwanghoon Sohn
    International Journal of Computer Vision, 2020, 128 : 799 - 817
  • [23] Simultaneous Deep Stereo Matching and Dehazing with Feature Attention
    Song, Taeyong
    Kim, Youngjung
    Oh, Changjae
    Jang, Hyunsung
    Ha, Namkoo
    Sohn, Kwanghoon
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (04) : 799 - 817
  • [24] Learning for Feature Matching via Graph Context Attention
    Guo, Junwen
    Xiao, Guobao
    Tang, Zhimin
    Chen, Shunxing
    Wang, Shiping
    Ma, Jiayi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [25] Modeling Selective Feature Attention for Lightweight Text Matching
    Zang, Jianxiang
    Liu, Hui
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 6624 - 6632
  • [26] LGFCTR: Local and Global Feature Convolutional Transformer for Image Matching
    Zhong, Wenhao
    Jiang, Jie
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 270
  • [27] Matching characteristics of the physically short linear impedance transformer
    Roy, SCD
    IEE PROCEEDINGS-MICROWAVES ANTENNAS AND PROPAGATION, 2001, 148 (02) : 137 - 139
  • [28] Hybrid Window Attention Based Transformer Architecture for Brain Tumor Segmentation
    Peiris, Himashi
    Hayat, Munawar
    Chen, Zhaolin
    Egan, Gary
    Harandi, Mehrtash
    BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES, BRAINLES 2022, PT II, 2023, 14092 : 173 - 182
  • [29] Pcwin Transformer: Permuted Channel Window based Attention for Image Classification
    Li, Shibao
    Liu, Yixuan
    Wang, Zhaoyu
    Cui, Xuerong
    Zhang, Yunwu
    Jia, Zekun
    Zhu, Jinze
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [30] FAM: Improving columnar vision transformer with feature attention mechanism
    Huang, Lan
    Bai, Xingyu
    Zeng, Jia
    Yu, Mengqiang
    Pang, Wei
    Wang, Kangping
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 242