Learning Sequence Descriptor Based on Spatio-Temporal Attention for Visual Place Recognition

被引:0
|
作者
Zhao, Junqiao [1 ,2 ,3 ]
Zhang, Fenglin [1 ,2 ]
Cai, Yingfeng [1 ,2 ]
Tian, Gengxuan [1 ,2 ]
Mu, Wenjie [1 ,2 ]
Ye, Chen [1 ,2 ]
Feng, Tiantian [4 ]
机构
[1] Tongji Univ, Sch Elect & Informat Engn, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Tongji Univ, MOE Key Lab Embedded Syst & Serv Comp, Shanghai 201804, Peoples R China
[3] Tongji Univ, Inst Intelligent Vehicles, Shanghai 201804, Peoples R China
[4] Tongji Univ, Sch Surveying & Geoinformat, Shanghai 200092, Peoples R China
关键词
Transformers; Visualization; Encoding; Computer architecture; Task analysis; Simultaneous localization and mapping; Heuristic algorithms; Recognition; localization; SLAM; visual place recognition;
D O I
10.1109/LRA.2024.3354627
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Visual Place Recognition (VPR) aims to retrieve frames from a geotagged database that are located at the same place as the query frame. To improve the robustness of VPR in perceptually aliasing scenarios, sequence-based VPR methods are proposed. These methods are either based on matching between frame sequences or extracting sequence descriptors for direct retrieval. However, the former is usually based on the assumption of constant velocity, which is difficult to hold in practice, and is computationally expensive and subject to sequence length. Although the latter overcomes these problems, existing sequence descriptors are constructed by aggregating features of multiple frames only, without interaction on temporal information, and thus cannot obtain descriptors with spatio-temporal discrimination. In this letter, we propose a sequence descriptor that effectively incorporates spatio-temporal information. Specifically, spatial attention within the same frame is utilized to learn spatial feature patterns, while attention in corresponding local regions of different frames is utilized to learn the persistence or change of features over time. We use a sliding window to control the temporal range of attention and use relative positional encoding to construct sequential relationships between different features. This allows our descriptors to capture the intrinsic dynamics in a sequence of frames. Comprehensive experiments on challenging benchmark datasets show that the proposed approach outperforms recent state-of-the-art methods.
引用
收藏
页码:2351 / 2358
页数:8
相关论文
共 50 条
  • [1] Spatio-Temporal Sequence Learning of Visual Place Cells for Robotic Navigation
    Vu Anh Nguyen
    Starzyk, Janusz A.
    Alex Leng Phuan Tay
    Goh, Wooi-Boon
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [2] Spatio-temporal hard attention learning for skeleton-based activity recognition
    Nikpour, Bahareh
    Armanfard, Narges
    [J]. PATTERN RECOGNITION, 2023, 139
  • [3] STA-VPR: Spatio-Temporal Alignment for Visual Place Recognition
    Lu, Feng
    Chen, Baifan
    Zhou, Xiang-Dong
    Song, Dezhen
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03): : 4297 - 4304
  • [4] Action Recognition With Spatio-Temporal Visual Attention on Skeleton Image Sequences
    Yang, Zhengyuan
    Li, Yuncheng
    Yang, Jianchao
    Luo, Jiebo
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (08) : 2405 - 2415
  • [5] Spatio-Temporal Difference Descriptor for Skeleton-Based Action Recognition
    Ding, Chongyang
    Liu, Kai
    Korhonen, Jari
    Belyaev, Evgeny
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1227 - 1235
  • [6] Collective Activity Recognition by Attribute-Based Spatio-Temporal Descriptor
    Chen, Changhong
    Dou, Hehe
    Gan, Zongliang
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (10): : 1875 - 1878
  • [7] Place Recognition and Online Learning in Dynamic Scenes with Spatio-Temporal Landmarks
    Johns, Edward
    Yang, Guang-Zhong
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
  • [8] Histogram of Directional Derivative Based Spatio-temporal Descriptor for Human Action Recognition
    Bhorge, Sidharth B.
    Manthalkar, Ramachandra R.
    [J]. 2017 1ST IEEE INTERNATIONAL CONFERENCE ON DATA MANAGEMENT, ANALYTICS AND INNOVATION (ICDMAI), 2017, : 42 - 46
  • [9] Action recognition with spatio-temporal augmented descriptor and fusion method
    Li, Lijun
    Dai, Shuling
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (12) : 13953 - 13969
  • [10] Attention-based spatio-temporal dependence learning network
    Ma, Qianli
    Tian, Shuai
    Wei, Jia
    Wang, Jiabing
    Ng, Wing W. Y.
    [J]. INFORMATION SCIENCES, 2019, 503 (92-108) : 92 - 108