Energy-Based Models for Cross-Modal Localization using Convolutional Transformers

被引:0
|
作者
Wu, Alan [1 ]
Ryoo, Michael S. [2 ]
机构
[1] MIT Lincoln Lab, Lexington, MA 02421 USA
[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY USA
关键词
IMAGES;
D O I
10.1109/ICRA48891.2023.10160267
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a novel framework using EnergyBased Models (EBMs) for localizing a ground vehicle mounted with a range sensor against satellite imagery in the absence of GPS. Lidar sensors have become ubiquitous on autonomous vehicles for describing its surrounding environment. Map priors are typically built using the same sensor modality for localization purposes. However, these map building endeavors using range sensors are often expensive and time-consuming. Alternatively, we leverage the use of satellite images as map priors, which are widely available, easily accessible, and provide comprehensive coverage. We propose a method using convolutional transformers that performs accurate metric-level localization in a cross-modal manner, which is challenging due to the drastic difference in appearance between the sparse range sensor readings and the rich satellite imagery. We train our model end-to-end and demonstrate our approach achieving higher accuracy than the state-of-the-art on KITTI, Pandaset, and a custom dataset.
引用
收藏
页码:11726 / 11733
页数:8
相关论文
共 50 条
  • [31] Cross-modal generative models for multi-modal plastic sorting
    Neo, Edward R. K.
    Low, Jonathan S. C.
    Goodship, Vannessa
    Coles, Stuart R.
    Debattista, Kurt
    JOURNAL OF CLEANER PRODUCTION, 2023, 415
  • [32] Assessing Implicit Odor Localization in Humans Using a Cross-Modal Spatial Cueing Paradigm
    Moessnang, Carolin
    Finkelmeyer, Andreas
    Vossen, Alexandra
    Schneider, Frank
    Habel, Ute
    PLOS ONE, 2011, 6 (12):
  • [33] Pretrained models for cross-modal retrieval: experiments and improvements
    Zhou, Kun
    Hassan, Fadratul Hafinaz
    Gan, Keng Hoon
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (05) : 4915 - 4923
  • [34] Sound Source Localization is All about Cross-Modal Alignment
    Senocak, Arda
    Ryu, Hyeonggon
    Kim, Junsik
    Oh, Tae-Hyun
    Pfister, Hanspeter
    Chung, Joon Son
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7743 - 7753
  • [35] Video Moment Localization via Deep Cross-Modal Hashing
    Hu, Yupeng
    Liu, Meng
    Su, Xiaobin
    Gao, Zan
    Nie, Liqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4667 - 4677
  • [36] Cross-modal Recipe Retrieval with Hierarchical Transformers and Pretrained Food Image Encoder
    Qin, Hanyan
    Zhang, Xiankun
    Song, Chen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14879 : 423 - 434
  • [37] Learning cross-modal appearance models with application to tracking
    Fisher, JW
    Darrell, T
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL II, PROCEEDINGS, 2003, : 13 - 16
  • [38] Persistent Stereo Visual Localization on Cross-Modal Invariant Map
    Ding, Xiaqing
    Wang, Yue
    Xiong, Rong
    Li, Dongxuan
    Tang, Li
    Yin, Huan
    Zhao, Liang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (11) : 4646 - 4658
  • [39] CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation With Transformers
    Zhang, Jiaming
    Liu, Huayao
    Yang, Kailun
    Hu, Xinxin
    Liu, Ruiping
    Stiefelhagen, Rainer
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (12) : 14679 - 14694
  • [40] Cross-modal localization in hemianopia:: new insights on multisensory integration
    Leo, Fabrizio
    Bolognini, Nadia
    Passamonti, Claudia
    Stein, Barry E.
    Ladavas, Elisabetta
    BRAIN, 2008, 131 : 855 - 865