Enhancing Visual Place Recognition With Hybrid Attention Mechanisms in MixVPR

被引：0

作者：

Hu, Jun ^{[1
]}

Nie, Jiwei ^{[1
,2
,4
]}

Ning, Zuotao ^{[1
]}

Feng, Chaolu ^{[3
]}

Wang, Luyang ^{[1
]}

Li, Jingyao ^{[1
]}

Cheng, Shuai ^{[1
]}

机构：

[1] Neusoft Reach Automot Technol Shenyang Co Ltd, Shenyang 110179, Peoples R China

[2] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110819, Peoples R China

[3] Minist Educ, Key Lab Intelligent Comp Med Image, Shenyang 110169, Peoples R China

[4] Northeastern Univ, Software Coll, Shenyang 110819, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Feature extraction; Transformers; Training; Vectors; Mixers; Attention mechanisms; Pipelines; Frequency modulation; Deep learning; Convolutional neural networks; Visual place recognition; SLAM; autonomous driving; deep learning; vision transformer; attention mechanism;

D O I：

10.1109/ACCESS.2024.3487171

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Visual Place Recognition (VPR) is a fundamental task in robotics and computer vision, where the ability to recognize locations from visual inputs is crucial for autonomous navigation systems. Traditional methods, which rely on handcrafted features or standard convolutional neural networks (CNNs), struggle with environmental changes that significantly alter a place's appearance. Recent advancements in deep learning have improved VPR by focusing on deep-learned features, enhancing robustness under varying conditions. However, these methods often overlook saliency cues, leading to inefficiencies in dynamic scenes. To address these limitations, we propose an improved MixVPR model that incorporates both self-attention and cross-attention mechanisms through a spatial-wise hybrid attention mechanism. This enhancement integrates spatial saliency cues into the global image embedding, improving accuracy and reliability. We also utilize the DINOv2 visual transformer for robust feature extraction. Extensive experiments on mainstream VPR benchmarks demonstrate that our method achieves superior performance while maintaining computational efficiency. Ablation studies and visualizations further validate the contributions of our attention mechanisms to the model's performance improvement.

引用

页码：159847 / 159859

页数：13

共 50 条

[21] MS-MixVPR: Multi-scale Feature Mixing Approach for Long-Term Place Recognition
Quach M.-D.
Vo D.-M.
Pham H.-A.
SN Computer Science, 5 (6)
[22] Recognition in early visual attention
Martinez, A.
PERCEPTION, 1999, 28 : 124 - 125
[23] Word recognition and visual attention
Vitu, F
Schroyens, W
Brysbaert, M
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 18449 - 18449
[24] Attention mechanisms in visual search
Feng, SH
Huang, XT
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2004, 39 (5-6) : 282 - 282
[25] RELATING ATTENTION TO VISUAL MECHANISMS
SHULMAN, GL
PERCEPTION & PSYCHOPHYSICS, 1990, 47 (02): : 199 - 203
[26] Neuronal Mechanisms of Visual Attention
Maunsell, John H. R.
ANNUAL REVIEW OF VISION SCIENCE, VOL 1, 2015, 1 : 373 - 391
[27] Intrathalamic Mechanisms of Visual Attention
Mayo, J. Patrick
JOURNAL OF NEUROPHYSIOLOGY, 2009, 101 (03) : 1123 - 1125
[28] A visual place recognition approach using learnable feature map filtering and graph attention networks
Qin, Cao
Zhang, Yunzhou
Liu, Yingda
Coleman, Sonya
Du, Huijie
Kerr, Dermot
NEUROCOMPUTING, 2021, 457 : 277 - 292
[29] GSAP: A Global Structure Attention Pooling Method for Graph-Based Visual Place Recognition
Yang, Yukun
Ma, Bo
Liu, Xiangdong
Zhao, Liang
Huang, Shoudong
REMOTE SENSING, 2021, 13 (08)
[30] Hardness-Aware Metric Learning With Cluster-Guided Attention for Visual Place Recognition
Guan, Peiyu
Cao, Zhiqiang
Fan, Shengxuan
Yang, Yuequan
Yu, Junzhi
Wang, Shuo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 367 - 379

← 1 2 3 4 5 →