Depth as attention to learn image representations for visual localization, using monocular images

被引:0
|
作者
Hettiarachchi, Dulmini [1 ]
Tian, Ye [1 ]
Yu, Han [2 ]
Kamijo, Shunsuke [3 ]
机构
[1] Univ Tokyo, Grad Sch Interdisciplinary Informat Studies, Tokyo 1130033, Japan
[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1130033, Japan
[3] Univ Tokyo, Inst Ind Sci IIS, Tokyo 1538505, Japan
关键词
Image retrieval; Visual localization; Image representation; Depth attention; Global descriptors;
D O I
10.1016/j.jvcir.2023.104012
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image retrieval algorithms are widely used in visual localization tasks. In visual localization, we can benefit from retrieving the images depicting same landmark taken from a pose similar to the query. However, state-of-the-art image retrieval algorithms are optimized mainly for landmark retrieval, and do not take camera pose into account. To address this limitation, we propose novel Depth Attention Network (DeAttNet). DeAttNet leverages both visual and depth information in learning a global image representation. Depth varies for similar features captured from different camera poses. Based on this insight, we employ depth within an attention mechanism to discern and emphasize the salient regions. In our method, we utilize monocular depth estimation algorithms to render depth maps. Compared to RGB only image descriptors, significant improvements are obtained with the proposed method on Mapillary Street Level Sequences, Pittsburgh and Cambridge Landmark datasets.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Airplane Localization in Satellite Images by using Visual Attention
    Ozyer, Gulsah Tumuklu
    Vural, Fatos T. Yarman
    [J]. 2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [2] SaccadeCam: Adaptive Visual Attention for Monocular Depth Sensing
    Tilmon, Brevin
    Koppal, Sanjeev J.
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5989 - 5998
  • [3] Evolving visual sonar: Depth from monocular images
    Martin, C. Martin
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (11) : 1174 - 1180
  • [4] Unsupervised Monocular Estimation of Depth and Visual Odometry Using Attention and Depth-Pose Consistency Loss
    Song, Xiaogang
    Hu, Haoyue
    Liang, Li
    Shi, Weiwei
    Xie, Guo
    Lu, Xiaofeng
    Hei, Xinhong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3517 - 3529
  • [5] A visual attention model for stereoscopic 3D images using monocular cues
    Iatsun, Iana
    Larabi, Mohamed-Chaker
    Fernandez-Maloigne, Christine
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2015, 38 : 70 - 83
  • [6] VISUAL OBJECT IDENTIFICATION, IMAGE FORESHORTENING, AND MONOCULAR DEPTH CUES
    HUMPHREY, K
    JOLICOEUR, P
    [J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1988, 26 (06) : 517 - 517
  • [7] MonoVAN: Visual Attention for Self-Supervised Monocular Depth Estimation
    Indyk, Ilia
    Makarov, Ilya
    [J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY, ISMAR, 2023, : 1211 - 1220
  • [8] Attention-Based Background/Foreground Monocular Depth Prediction Model Using Image Segmentation
    Chiang, Ting-Hui
    Chiang, Meng-Hsiu
    Tsai, Ming-Han
    Chang, Che-Cheng
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (21):
  • [9] A self-supervised monocular odometry with visual-inertial and depth representations
    Zhao, Lingzhe
    Xiang, Tianyu
    Wang, Zhuping
    [J]. JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2024, 361 (06):
  • [10] The Influence of Depth of Field on Visual Attention in Moving Images
    Soderberg, Christina
    Christiansen, Oliver
    Durant, Szonya
    [J]. PERCEPTION, 2019, 48 : 143 - 143