Enhancing Visual Place Recognition With Hybrid Attention Mechanisms in MixVPR

被引:0
|
作者
Hu, Jun [1 ]
Nie, Jiwei [1 ,2 ,4 ]
Ning, Zuotao [1 ]
Feng, Chaolu [3 ]
Wang, Luyang [1 ]
Li, Jingyao [1 ]
Cheng, Shuai [1 ]
机构
[1] Neusoft Reach Automot Technol Shenyang Co Ltd, Shenyang 110179, Peoples R China
[2] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Peoples R China
[3] Minist Educ, Key Lab Intelligent Comp Med Image, Shenyang 110169, Peoples R China
[4] Northeastern Univ, Software Coll, Shenyang 110819, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Transformers; Training; Vectors; Mixers; Attention mechanisms; Pipelines; Frequency modulation; Deep learning; Convolutional neural networks; Visual place recognition; SLAM; autonomous driving; deep learning; vision transformer; attention mechanism;
D O I
10.1109/ACCESS.2024.3487171
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Place Recognition (VPR) is a fundamental task in robotics and computer vision, where the ability to recognize locations from visual inputs is crucial for autonomous navigation systems. Traditional methods, which rely on handcrafted features or standard convolutional neural networks (CNNs), struggle with environmental changes that significantly alter a place's appearance. Recent advancements in deep learning have improved VPR by focusing on deep-learned features, enhancing robustness under varying conditions. However, these methods often overlook saliency cues, leading to inefficiencies in dynamic scenes. To address these limitations, we propose an improved MixVPR model that incorporates both self-attention and cross-attention mechanisms through a spatial-wise hybrid attention mechanism. This enhancement integrates spatial saliency cues into the global image embedding, improving accuracy and reliability. We also utilize the DINOv2 visual transformer for robust feature extraction. Extensive experiments on mainstream VPR benchmarks demonstrate that our method achieves superior performance while maintaining computational efficiency. Ablation studies and visualizations further validate the contributions of our attention mechanisms to the model's performance improvement.
引用
收藏
页码:159847 / 159859
页数:13
相关论文
共 50 条
  • [1] MixVPR: Feature Mixing for Visual Place Recognition
    Ali-bey, Amar
    Chaib-draa, Brahim
    Giguere, Philippe
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2997 - 3006
  • [2] Semantic Reinforced Attention Learning for Visual Place Recognition
    Peng, Guohao
    Yue, Yufeng
    Zhang, Jun
    Wu, Zhenyu
    Tang, Xiaoyu
    Wang, Danwei
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13415 - 13422
  • [3] Enhancing Visual Place Recognition with Semantic Segmentation Filters
    Zhou, Zeyu
    Yang, Huan
    Liu, Haoran
    Wang, Danwei
    2024 IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, CIS AND IEEE INTERNATIONAL CONFERENCE ON ROBOTICS, AUTOMATION AND MECHATRONICS, RAM, CIS-RAM 2024, 2024, : 405 - 410
  • [4] A Hybrid Compact Neural Architecture for Visual Place Recognition
    Chancan, Marvin
    Hernandez-Nunez, Luis
    Narendra, Ajay
    Barron, Andrew B.
    Milford, Michael
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02): : 993 - 1000
  • [5] MIXVPR++: Enhanced Visual Place Recognition With Hierarchical-Region Feature-Mixer and Adaptive Gabor Texture Fuser
    Nie, Jiwei
    Xue, Dingyu
    Pan, Feng
    Cheng, Shuai
    Liu, Wei
    Hu, Jun
    Ning, Zuotao
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 580 - 587
  • [6] Attention-Aware Age-Agnostic Visual Place Recognition
    Wang, Ziqi
    Li, Jiahui
    Khademi, Seyran
    van Gemert, Jan
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1437 - 1446
  • [7] Attention-based Pyramid Aggregation Network for Visual Place Recognition
    Zhu, Yingying
    Wang, Jiong
    Xie, Lingxi
    Zheng, Liang
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 99 - 107
  • [8] Hybrid CNN-Transformer Features for Visual Place Recognition
    Wang, Yuwei
    Qiu, Yuanying
    Cheng, Peitao
    Zhang, Junyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1109 - 1122
  • [9] A MODEL FOR VISUAL RECOGNITION AND THE POSSIBLE ROLE OF MECHANISMS OF SELECTIVE ATTENTION
    PAVLOVSKAYA, MB
    VOL, IA
    PERCEPTION, 1989, 18 (04) : 525 - 526
  • [10] Learning Semantics for Visual Place Recognition Through Multi-scale Attention
    Paolicelli, Valerio
    Tavera, Antonio
    Masone, Carlo
    Berton, Gabriele
    Caputo, Barbara
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 454 - 466