Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition

被引:259
|
作者
Yu, Jun [1 ]
Zhu, Chaoyang [1 ]
Zhang, Jian [2 ]
Huang, Qingming [3 ]
Tao, Dacheng [4 ,5 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Peoples R China
[2] Zhejiang Int Studies Univ, Sch Sci & Technol, Hangzhou 310012, Peoples R China
[3] Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 101408, Peoples R China
[4] Univ Sydney, UBTECH Sydney Artificial Intelligence Ctr, Darlington, NSW 2008, Australia
[5] Univ Sydney, Fac Engn & Informat Technol, Sch Comp Sci, Darlington, NSW 2008, Australia
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
Feature extraction; Global Positioning System; Image recognition; Training; Deep learning; Vocabulary; Optimization; Place recognition; spatial pyramid pooling; triplet loss (T-loss); vector of locally aggregated descriptors (VLAD); NONLINEAR DIMENSIONALITY REDUCTION; IMAGE;
D O I
10.1109/TNNLS.2019.2908982
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an end-to-end place recognition model based on a novel deep neural network. First, we propose to exploit the spatial pyramid structure of the images to enhance the vector of locally aggregated descriptors (VLAD) such that the enhanced VLAD features can reflect the structural information of the images. To encode this feature extraction into the deep learning method, we build a spatial pyramid-enhanced VLAD (SPE-VLAD) layer. Next, we impose weight constraints on the terms of the traditional triplet loss (T-loss) function such that the weighted T-loss (WT-loss) function avoids the suboptimal convergence of the learning process. The loss function can work well under weakly supervised scenarios in that it determines the semantically positive and negative samples of each query through not only the GPS tags but also the Euclidean distance between the image representations. The SPE-VLAD layer and the WT-loss layer are integrated with the VGG-16 network or ResNet-18 network to form a novel end-to-end deep neural network that can be easily trained via the standard backpropagation method. We conduct experiments on three benchmark data sets, and the results demonstrate that the proposed model defeats the state-of-the-art deep learning approaches applied to place recognition.
引用
收藏
页码:661 / 674
页数:14
相关论文
共 20 条
  • [1] A novel spatial pyramid-enhanced indoor visual positioning method
    Yang, Jiaqiang
    Qin, Danyang
    Tang, Huapeng
    Tao, Sili
    Bie, Haoze
    Ma, Lin
    DIGITAL SIGNAL PROCESSING, 2025, 156
  • [2] Patch-NetVLAD plus : Learned patch descriptor and weighted matching strategy for place recognition
    Cai, Yingfeng
    Zhao, Junqiao
    Cui, Jiafeng
    Zhang, Fenglin
    Feng, Tiantian
    Ye, Chen
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS (MFI), 2022,
  • [3] Pyramid transformer-based triplet hashing for robust visual place recognition
    Li, Zhenyu
    Xu, Pengjie
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [4] Vehicle Logo Recognition Based on a Weighted Spatial Pyramid Framework
    Ou, Yuanchang
    Zheng, Huicheng
    Chen, Shuyue
    Chen, Jiangtao
    2014 IEEE 17TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2014, : 1238 - 1244
  • [5] CSPFormer: A cross-spatial pyramid transformer for visual place recognition
    Li, Zhenyu
    Xu, Pengjie
    NEUROCOMPUTING, 2024, 580
  • [6] SPANET: SPATIAL PYRAMID ATTENTION NETWORK FOR ENHANCED IMAGE RECOGNITION
    Guo, Jingda
    Ma, Xu
    Sansom, Andrew
    McGuire, Mara
    Kalaani, Andrew
    Chen, Qi
    Tang, Sihai
    Yang, Qing
    Fu, Song
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [7] Real-time image recognition using weighted spatial pyramid networks
    Xiaoning Zhu
    Qingyue Meng
    Lize Gu
    Journal of Real-Time Image Processing, 2018, 15 : 617 - 629
  • [8] Real-time image recognition using weighted spatial pyramid networks
    Zhu, Xiaoning
    Meng, Qingyue
    Gu, Lize
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2018, 15 (03) : 617 - 629
  • [9] Optimal Densely Connected Networks with Pyramid Spatial Matching Scheme for Visual Place Recognition
    Sasikumar, P.
    Sathiamoorthy, S.
    PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2022, 2023, 475 : 123 - 137
  • [10] Spatial pyramid face feature representation and weighted dissimilarity matching for improved face recognition
    Jae Young Choi
    The Visual Computer, 2018, 34 : 1535 - 1549