Enhancing Visual Place Recognition With Hybrid Attention Mechanisms in MixVPR

被引:0
|
作者
Hu, Jun [1 ]
Nie, Jiwei [1 ,2 ,4 ]
Ning, Zuotao [1 ]
Feng, Chaolu [3 ]
Wang, Luyang [1 ]
Li, Jingyao [1 ]
Cheng, Shuai [1 ]
机构
[1] Neusoft Reach Automot Technol Shenyang Co Ltd, Shenyang 110179, Peoples R China
[2] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Peoples R China
[3] Minist Educ, Key Lab Intelligent Comp Med Image, Shenyang 110169, Peoples R China
[4] Northeastern Univ, Software Coll, Shenyang 110819, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Transformers; Training; Vectors; Mixers; Attention mechanisms; Pipelines; Frequency modulation; Deep learning; Convolutional neural networks; Visual place recognition; SLAM; autonomous driving; deep learning; vision transformer; attention mechanism;
D O I
10.1109/ACCESS.2024.3487171
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Place Recognition (VPR) is a fundamental task in robotics and computer vision, where the ability to recognize locations from visual inputs is crucial for autonomous navigation systems. Traditional methods, which rely on handcrafted features or standard convolutional neural networks (CNNs), struggle with environmental changes that significantly alter a place's appearance. Recent advancements in deep learning have improved VPR by focusing on deep-learned features, enhancing robustness under varying conditions. However, these methods often overlook saliency cues, leading to inefficiencies in dynamic scenes. To address these limitations, we propose an improved MixVPR model that incorporates both self-attention and cross-attention mechanisms through a spatial-wise hybrid attention mechanism. This enhancement integrates spatial saliency cues into the global image embedding, improving accuracy and reliability. We also utilize the DINOv2 visual transformer for robust feature extraction. Extensive experiments on mainstream VPR benchmarks demonstrate that our method achieves superior performance while maintaining computational efficiency. Ablation studies and visualizations further validate the contributions of our attention mechanisms to the model's performance improvement.
引用
收藏
页码:159847 / 159859
页数:13
相关论文
共 50 条
  • [21] MS-MixVPR: Multi-scale Feature Mixing Approach for Long-Term Place Recognition
    Quach M.-D.
    Vo D.-M.
    Pham H.-A.
    SN Computer Science, 5 (6)
  • [22] Recognition in early visual attention
    Martinez, A.
    PERCEPTION, 1999, 28 : 124 - 125
  • [23] Word recognition and visual attention
    Vitu, F
    Schroyens, W
    Brysbaert, M
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 18449 - 18449
  • [24] Attention mechanisms in visual search
    Feng, SH
    Huang, XT
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2004, 39 (5-6) : 282 - 282
  • [25] RELATING ATTENTION TO VISUAL MECHANISMS
    SHULMAN, GL
    PERCEPTION & PSYCHOPHYSICS, 1990, 47 (02): : 199 - 203
  • [26] Neuronal Mechanisms of Visual Attention
    Maunsell, John H. R.
    ANNUAL REVIEW OF VISION SCIENCE, VOL 1, 2015, 1 : 373 - 391
  • [27] Intrathalamic Mechanisms of Visual Attention
    Mayo, J. Patrick
    JOURNAL OF NEUROPHYSIOLOGY, 2009, 101 (03) : 1123 - 1125
  • [28] A visual place recognition approach using learnable feature map filtering and graph attention networks
    Qin, Cao
    Zhang, Yunzhou
    Liu, Yingda
    Coleman, Sonya
    Du, Huijie
    Kerr, Dermot
    NEUROCOMPUTING, 2021, 457 : 277 - 292
  • [29] GSAP: A Global Structure Attention Pooling Method for Graph-Based Visual Place Recognition
    Yang, Yukun
    Ma, Bo
    Liu, Xiangdong
    Zhao, Liang
    Huang, Shoudong
    REMOTE SENSING, 2021, 13 (08)
  • [30] Hardness-Aware Metric Learning With Cluster-Guided Attention for Visual Place Recognition
    Guan, Peiyu
    Cao, Zhiqiang
    Fan, Shengxuan
    Yang, Yuequan
    Yu, Junzhi
    Wang, Shuo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 367 - 379