Energy-Based Models for Cross-Modal Localization using Convolutional Transformers

被引:0
|
作者
Wu, Alan [1 ]
Ryoo, Michael S. [2 ]
机构
[1] MIT Lincoln Lab, Lexington, MA 02421 USA
[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY USA
关键词
IMAGES;
D O I
10.1109/ICRA48891.2023.10160267
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a novel framework using EnergyBased Models (EBMs) for localizing a ground vehicle mounted with a range sensor against satellite imagery in the absence of GPS. Lidar sensors have become ubiquitous on autonomous vehicles for describing its surrounding environment. Map priors are typically built using the same sensor modality for localization purposes. However, these map building endeavors using range sensors are often expensive and time-consuming. Alternatively, we leverage the use of satellite images as map priors, which are widely available, easily accessible, and provide comprehensive coverage. We propose a method using convolutional transformers that performs accurate metric-level localization in a cross-modal manner, which is challenging due to the drastic difference in appearance between the sparse range sensor readings and the rich satellite imagery. We train our model end-to-end and demonstrate our approach achieving higher accuracy than the state-of-the-art on KITTI, Pandaset, and a custom dataset.
引用
收藏
页码:11726 / 11733
页数:8
相关论文
共 50 条
  • [1] Toward Interpretable Sleep Stage Classification Using Cross-Modal Transformers
    Pradeepkumar, Jathurshan
    Anandakumar, Mithunjha
    Kugathasan, Vinith
    Suntharalingham, Dhinesh
    Kappel, Simon L.
    De Silva, Anjula C.
    Edussooriya, Chamira U. S.
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2024, 32 : 2893 - 2904
  • [2] Cross-modal Moment Localization in Videos
    Liu, Meng
    Wang, Xiang
    Nie, Liqiang
    Tian, Qi
    Chen, Baoquan
    Chua, Tat-Seng
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 843 - 851
  • [3] Cross-modal localization via sparsity
    Kidron, Einat
    Schechner, Yoav Y.
    Elad, Michael
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (04) : 1390 - 1404
  • [4] Modal Strain Energy-Based Structural Damage Detection Using Convolutional Neural Networks
    Teng, Shuai
    Chen, Gongfa
    Liu, Gen
    Lv, Jianbin
    Cui, Fangsen
    APPLIED SCIENCES-BASEL, 2019, 9 (16):
  • [5] Cross-Modal Transformers for Infrared and Visible Image Fusion
    Park, Seonghyun
    Vien, An Gia
    Lee, Chul
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 770 - 785
  • [6] Differentiable Cross-modal Hashing via Multimodal Transformers
    Tu, Junfeng
    Liu, Xueliang
    Lin, Zongxiang
    Hong, Richang
    Wang, Meng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [7] Proxy-Based Graph Convolutional Hashing for Cross-Modal Retrieval
    Bai, Yibing
    Shu, Zhenqiu
    Yu, Jun
    Yu, Zhengtao
    Wu, Xiao-Jun
    IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (04) : 371 - 385
  • [8] Pre-Training Transformers as Energy-Based Cloze Models
    Clark, Kevin
    Luong, Minh-Thang
    Le, Quoc V.
    Manning, Christopher D.
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 285 - 294
  • [9] Cross-Modal Conceptualization in Bottleneck Models
    Alukaev, Danis
    Kiselev, Semen
    Pershin, Ilya
    Ibragimov, Bulat
    Ivanov, Vladimir
    Kornaev, Alexey
    Titov, Ivan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5241 - 5253
  • [10] Cross-Modal Localization Through Mutual Information
    Alempijevic, Alen
    Kodagoda, Sarath
    Dissanayake, Gamini
    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 5597 - 5602