Energy-Based Models for Cross-Modal Localization using Convolutional Transformers

被引:0
|
作者
Wu, Alan [1 ]
Ryoo, Michael S. [2 ]
机构
[1] MIT Lincoln Lab, Lexington, MA 02421 USA
[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY USA
关键词
IMAGES;
D O I
10.1109/ICRA48891.2023.10160267
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a novel framework using EnergyBased Models (EBMs) for localizing a ground vehicle mounted with a range sensor against satellite imagery in the absence of GPS. Lidar sensors have become ubiquitous on autonomous vehicles for describing its surrounding environment. Map priors are typically built using the same sensor modality for localization purposes. However, these map building endeavors using range sensors are often expensive and time-consuming. Alternatively, we leverage the use of satellite images as map priors, which are widely available, easily accessible, and provide comprehensive coverage. We propose a method using convolutional transformers that performs accurate metric-level localization in a cross-modal manner, which is challenging due to the drastic difference in appearance between the sparse range sensor readings and the rich satellite imagery. We train our model end-to-end and demonstrate our approach achieving higher accuracy than the state-of-the-art on KITTI, Pandaset, and a custom dataset.
引用
收藏
页码:11726 / 11733
页数:8
相关论文
共 50 条
  • [21] Semantic Collaborative Learning for Cross-Modal Moment Localization
    Hu, Yupeng
    Wang, Kun
    Liu, Meng
    Tang, Haoyu
    Nie, Liqiang
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (02)
  • [22] Visual localization ability influences cross-modal bias
    Hairston, WD
    Wallace, MT
    Vaughan, JW
    Stein, BE
    Norris, JL
    Schirillo, JA
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2003, 15 (01) : 20 - 29
  • [23] SELF-SUPERVISED LEARNING WITH CROSS-MODAL TRANSFORMERS FOR EMOTION RECOGNITION
    Khare, Aparna
    Parthasarathy, Srinivas
    Sundaram, Shiva
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 381 - 388
  • [24] Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval
    Bin, Yi
    Li, Haoxuan
    Xu, Yahui
    Xu, Xing
    Yang, Yang
    Shen, Heng Tao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3041 - 3050
  • [25] Semantic-Aligned Cross-Modal Visual Grounding Network with Transformers
    Zhang, Qianjun
    Yuan, Jin
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [26] Graph Convolutional Network Discrete Hashing for Cross-Modal Retrieval
    Bai, Cong
    Zeng, Chao
    Ma, Qing
    Zhang, Jinglin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4756 - 4767
  • [27] Audio-Visual Event Localization based on Cross-Modal Interacting Guidance
    Yue, Qiurui
    Wu, Xiaoyu
    Gao, Jiayi
    2021 IEEE FOURTH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2021), 2021, : 104 - 107
  • [28] Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models
    Bhattacharyya, Sumanta
    Rooshenas, Amirmohammad
    Naskar, Subhajit
    Sun, Simeng
    Iyyer, Mohit
    McCallum, Andrew
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4528 - 4537
  • [29] Energy-based graph convolutional networks for scoring protein docking models
    Cao, Yue
    Shen, Yang
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2020, 88 (08) : 1091 - 1099
  • [30] Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network
    Liang, Bin
    Lou, Chenwei
    Li, Xiang
    Yang, Min
    Gui, Lin
    He, Yulan
    Pei, Wenjie
    Xu, Ruifeng
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1767 - 1777