DSAP: Dynamic Sparse Attention Perception Matcher for Accurate Local Feature Matching

被引:0
|
作者
Dai, Kun [1 ]
Wang, Ke [2 ,3 ]
Xie, Tao [1 ,4 ]
Sun, Tao [4 ]
Zhang, Jinhang [1 ]
Kong, Qingjia [1 ]
Jiang, Zhiqiang [1 ]
Li, Ruifeng [1 ]
Zhao, Lijun [2 ,3 ]
Omar, Mohamed [1 ]
机构
[1] Harbin Inst Technol, State Key Lab Robot & Syst, Harbin 150006, Peoples R China
[2] Harbin Inst Technol, State Key Lab Robot & Syst, Harbin 150006, Peoples R China
[3] Harbin Inst Technol, Zhengzhou Res Inst, Harbin 150006, Peoples R China
[4] Yangtze River Delta HIT Robot Technol Res Inst, Wuhu 241000, Peoples R China
关键词
Deep learning; dynamic attention perception; local feature matching; relative pose estimation; sparse attention; visual localization;
D O I
10.1109/TIM.2024.3370781
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Local feature matching, which aims to establish the matches between image pairs, is a pivotal component of multiple visual applications. While current transformer-based works exhibit remarkable performance, they mechanically alternate self- and cross-attention in a predetermined order without considering their prioritization, culminating in inadequate enhancement of visual descriptors. Moreover, when calculating attention matrices to integrate global context, current methods only explicitly model the correlation among the feature channels without taking their importance into account, leaving insufficient message propagation. In this work, we develop a dynamic sparse attention perception (DSAP) matcher to tackle the aforementioned issues. To resolve the first issue, DSAP presents a dynamic perception strategy (DPS) that enables the network to dynamically implement feature enhancement via modifying both forward and backward propagation. During forward propagation, DPS assigns a learnable perception score to each transformer layer and employs an exponential moving average algorithm (EMA) to calculate the current score. After that, DPS utilizes an indicator function to binarize the score, allowing DSAP to adaptively determine the appropriate utilization of self- or cross-attention at the current iteration. During backward propagation, DPS employs a gradient estimator that adjusts the gradient of perception scores, thus rendering them differentiable. To tackle the second issue, DSAP introduces a weighted sparse transformer (WSFormer) that recalibrates attention matrices by concurrently considering both channel importance and channel correlation. WSFormer predicts attention vectors to weight attention matrices while constructing multiple sparse attention matrices to integrate various global messages, thus highlighting informative channels and inhibiting redundant message propagation. Extensive experiments in public datasets and real environments demonstrate that DSAP achieves exceptional performances across various downstream tasks, including relative pose estimation and visual localization. The code is available at https://github.com/mooncake199809/DSAP.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 28 条
  • [21] Transformer enhanced by local perception self-attention for dynamic soft sensor modeling of industrial processes
    Fang, Zeyu
    Gao, Shiwei
    Dang, Xiaochao
    Dong, Xiaohui
    Wang, Qiong
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (05)
  • [22] High-Speed and Accurate Diagnosis of Gastrointestinal Disease: Learning on Endoscopy Images Using Lightweight Transformer with Local Feature Attention
    Wu, Shibin
    Zhang, Ruxin
    Yan, Jiayi
    Li, Chengquan
    Liu, Qicai
    Wang, Liyang
    Wang, Haoqian
    BIOENGINEERING-BASEL, 2023, 10 (12):
  • [23] Local-global feature-based spatio-temporal wind speed forecasting with a sparse and dynamic graph
    Wang, Yun
    Song, Mengmeng
    Yang, Dazhi
    ENERGY, 2024, 289
  • [24] Sparse low-redundancy multilabel feature selection based on dynamic local structure preservation and triple graphs exploration
    Yang, Yong
    Chen, Hongmei
    Mi, Yong
    Luo, Chuan
    Horng, Shi-Jinn
    Li, Tianrui
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242
  • [25] Robust sparse and low-redundancy multi-label feature selection with dynamic local and global structure preservation
    Li, Yonghao
    Hu, Liang
    Gao, Wanfu
    PATTERN RECOGNITION, 2023, 134
  • [26] A Multiview Sparse Dynamic Graph Convolution-Based Region-Attention Feature Fusion Network for Major Depressive Disorder Detection
    Cui, Weigang
    Sun, Mingyi
    Dong, Qunxi
    Guo, Yuzhu
    Liao, Xiao-Feng
    Li, Yang
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 2691 - 2702
  • [27] Feature-enhanced few-shot method for bearing fault diagnosis based on dynamic sparse attention under noise conditions
    Li, Yibing
    Duan, Qiping
    Jiang, Li
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2025, 36 (02)
  • [28] Bridging the gap: dual perception attention and local-global similarity fusion for cross-modal image-text matching
    Shui, Xiangyu
    Zhu, Zhenfang
    Liu, Yun
    Pei, Hongli
    Li, Kefeng
    Zhang, Huaxiang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 72043 - 72062