Revisit Anything: Visual Place Recognition via Image Segment Retrieval

被引:0
|
作者
Garg, Kartik [1 ]
Shubodh, Sai [2 ]
Kolathaya, Shishir [1 ]
Krishna, Madhava [2 ]
Garg, Sourav [3 ]
机构
[1] Indian Inst Sci IISc, Bengaluru, India
[2] Int Inst Informat Technol, Hyderabad, India
[3] Univ Adelaide, Adelaide, SA, Australia
来源
关键词
Visual Place Recognition; Image Segmentation; Robotics; SCALE;
D O I
10.1007/978-3-031-73113-6_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accurately recognizing a revisited place is crucial for embodied agents to localize and navigate. This requires visual representations to be distinct, despite strong variations in camera viewpoint and scene appearance. Existing visual place recognition pipelines encode the whole image and search for matches. This poses a fundamental challenge in matching two images of the same place captured from different camera viewpoints: the similarity of what overlaps can be dominated by the dissimilarity of what does not overlap. We address this by encoding and searching for image segments instead of the whole images. We propose to use open-set image segmentation to decompose an image into 'meaningful' entities (i.e., things and stuff). This enables us to create a novel image representation as a collection of multiple overlapping subgraphs connecting a segment with its neighboring segments, dubbed SuperSegment. Furthermore, to efficiently encode these SuperSegments into compact vector representations, we propose a novel factorized representation of feature aggregation. We show that retrieving these partial representations leads to significantly higher recognition recall than the typical whole image based retrieval. Our segments-based approach, dubbed SegVLAD, sets a new state-of-the-art in place recognition on a diverse selection of benchmark datasets, while being applicable to both generic and task-specialized image encoders. Finally, we demonstrate the potential of our method to "revisit anything" by evaluating our method on an object instance retrieval task, which bridges the two disparate areas of research: visual place recognition and object-goal navigation, through their common aim of recognizing goal objects specific to a place. Source code: https://github.com/AnyLoc/Revisit-Anything.
引用
收藏
页码:326 / 343
页数:18
相关论文
共 50 条
  • [1] Leveraging Image based Prior for Visual Place Recognition
    Taisho, Tsukamoto
    Kanji, Tanaka
    2015 14TH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2015, : 194 - 197
  • [2] Segment Anything for Visual Bird Sound Denoising
    Zhou, Chenxi
    Wan, Tianjiao
    Xu, Kele
    Qiao, Peng
    Dou, Yong
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 1076 - 1080
  • [3] Matte anything: Interactive natural image matting with segment anything model
    Yao, Jingfeng
    Wang, Xinggang
    Ye, Lang
    Liu, Wenyu
    IMAGE AND VISION COMPUTING, 2024, 147
  • [4] Multispectral Domain Invariant Image for Retrieval-based Place Recognition
    Han, Daechan
    Hwang, YuJin
    Kim, Namil
    Choi, Yukyung
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 9271 - 9277
  • [5] On the Estimation of Image-matching Uncertainty in Visual Place Recognition
    Zaffar, Mubariz
    Nan, Liangliang
    Kooij, Julian F. P.
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 17743 - 17753
  • [6] Visual Place Recognition via Local Affine Preserving Matching
    Ye, Xinyu
    Ma, Jiayi
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 12954 - 12960
  • [7] Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments
    Liu, Ruiping
    Zhang, Jiaming
    Peng, Kunyu
    Zheng, Junwei
    Cao, Ke
    Chen, Yufan
    Yang, Kailun
    Stiefelhagen, Rainer
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 1849 - 1859
  • [8] Underwater Place Recognition in Unknown Environments with Triplet Based Acoustic Image Retrieval
    Ribeiro, Pedro O. C. S.
    dos Santos, Matheus M.
    Drews-, Paulo L. J., Jr.
    Botelho, Silvia S. C.
    Longaray, Lucas M.
    Giacomo, Giovanni G.
    Pias, Marcelo R.
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 524 - 529
  • [9] IMAGE RETRIEVAL WITH LINGUAL AND VISUAL PARAPHRASING VIA GENERATIVE MODELS
    Yanagi, Rintaro
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2431 - 2435
  • [10] Semantic Image Retrieval via Active Grounding of Visual Situations
    Quinn, Max H.
    Conser, Erik
    Witte, Jordan M.
    Mitchell, Melanie
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 172 - 179