Enhancing Visual Place Recognition With Hybrid Attention Mechanisms in MixVPR

被引:0
|
作者
Hu, Jun [1 ]
Nie, Jiwei [1 ,2 ,4 ]
Ning, Zuotao [1 ]
Feng, Chaolu [3 ]
Wang, Luyang [1 ]
Li, Jingyao [1 ]
Cheng, Shuai [1 ]
机构
[1] Neusoft Reach Automot Technol Shenyang Co Ltd, Shenyang 110179, Peoples R China
[2] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Peoples R China
[3] Minist Educ, Key Lab Intelligent Comp Med Image, Shenyang 110169, Peoples R China
[4] Northeastern Univ, Software Coll, Shenyang 110819, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Transformers; Training; Vectors; Mixers; Attention mechanisms; Pipelines; Frequency modulation; Deep learning; Convolutional neural networks; Visual place recognition; SLAM; autonomous driving; deep learning; vision transformer; attention mechanism;
D O I
10.1109/ACCESS.2024.3487171
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Place Recognition (VPR) is a fundamental task in robotics and computer vision, where the ability to recognize locations from visual inputs is crucial for autonomous navigation systems. Traditional methods, which rely on handcrafted features or standard convolutional neural networks (CNNs), struggle with environmental changes that significantly alter a place's appearance. Recent advancements in deep learning have improved VPR by focusing on deep-learned features, enhancing robustness under varying conditions. However, these methods often overlook saliency cues, leading to inefficiencies in dynamic scenes. To address these limitations, we propose an improved MixVPR model that incorporates both self-attention and cross-attention mechanisms through a spatial-wise hybrid attention mechanism. This enhancement integrates spatial saliency cues into the global image embedding, improving accuracy and reliability. We also utilize the DINOv2 visual transformer for robust feature extraction. Extensive experiments on mainstream VPR benchmarks demonstrate that our method achieves superior performance while maintaining computational efficiency. Ablation studies and visualizations further validate the contributions of our attention mechanisms to the model's performance improvement.
引用
收藏
页码:159847 / 159859
页数:13
相关论文
共 50 条
  • [31] Self-learning Attention Global Pooling Based Image Representation for Visual Place Recognition
    Huang, Xiaoquan
    Zheng, Song
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 849 - 854
  • [32] Enhancing Visual Place Recognition Using Discrete Cosine Transform and Difference-Based Descriptors
    Zhang, Qieshi
    Xu, Zhenyu
    Yang, Zhiyong
    Ren, Ziliang
    Yuan, Shuai
    Cheng, Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (07) : 3368 - 3372
  • [33] DINO-Mix enhancing visual place recognition with foundational vision model and feature mixing
    Huang, Gaoshuang
    Zhou, Yang
    Hu, Xiaofei
    Zhang, Chenglong
    Zhao, Luying
    Gan, Wenjian
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [34] Visual Place Recognition with Repetitive Structures
    Torii, Akihiko
    Sivic, Josef
    Okutomi, Masatoshi
    Pajdla, Tomas
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (11) : 2346 - 2359
  • [35] Visual Place Recognition with Repetitive Structures
    Torii, Akihiko
    Sivic, Josef
    Pajdla, Tomas
    Okutomi, Masatoshi
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 883 - 890
  • [36] A Survey on Deep Visual Place Recognition
    Masone, Carlo
    Caputo, Barbara
    IEEE ACCESS, 2021, 9 : 19516 - 19547
  • [37] Location Graphs for Visual Place Recognition
    Stumm, Elena
    Mei, Christopher
    Lacroix, Simon
    Chli, Margarita
    2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2015, : 5475 - 5480
  • [38] The Research Status of Visual Place Recognition
    Wang, Bo
    Wu, Xin-sheng
    Chen, An
    Chen, Chun-yu
    Liu, Hai-ming
    2020 4TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND INFORMATION TECHNOLOGY (CMVIT 2020), 2020, 1518
  • [39] Visual place recognition for autonomous robots
    Tagare, HD
    McDermott, D
    Xiao, H
    1998 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-4, 1998, : 2530 - 2535
  • [40] Joint modelling of audio-visual cues using attention mechanisms for emotion recognition
    Ghaleb, Esam
    Niehues, Jan
    Asteriadis, Stylianos
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (08) : 11239 - 11264