Energy-Based Models for Cross-Modal Localization using Convolutional Transformers

被引:0
|
作者
Wu, Alan [1 ]
Ryoo, Michael S. [2 ]
机构
[1] MIT Lincoln Lab, Lexington, MA 02421 USA
[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY USA
关键词
IMAGES;
D O I
10.1109/ICRA48891.2023.10160267
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a novel framework using EnergyBased Models (EBMs) for localizing a ground vehicle mounted with a range sensor against satellite imagery in the absence of GPS. Lidar sensors have become ubiquitous on autonomous vehicles for describing its surrounding environment. Map priors are typically built using the same sensor modality for localization purposes. However, these map building endeavors using range sensors are often expensive and time-consuming. Alternatively, we leverage the use of satellite images as map priors, which are widely available, easily accessible, and provide comprehensive coverage. We propose a method using convolutional transformers that performs accurate metric-level localization in a cross-modal manner, which is challenging due to the drastic difference in appearance between the sparse range sensor readings and the rich satellite imagery. We train our model end-to-end and demonstrate our approach achieving higher accuracy than the state-of-the-art on KITTI, Pandaset, and a custom dataset.
引用
收藏
页码:11726 / 11733
页数:8
相关论文
共 50 条
  • [41] Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
    Liu, Xian
    Qian, Rui
    Zhou, Hang
    Hu, Di
    Lin, Weiyao
    Liu, Ziwei
    Zhou, Bolei
    Zhou, Xiaowei
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1801 - 1809
  • [42] Privacy-Enhanced Prototype-Based Federated Cross-Modal Hashing for Cross-Modal Retrieval
    Zuo, Ruifan
    Zheng, Chaoqun
    Li, Fengling
    Zhu, Lei
    Zhang, Zheng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (09)
  • [43] Semi-supervised Cross-Modal Hashing with Graph Convolutional Networks
    Duan, Jiasheng
    Luo, Yadan
    Wang, Ziwei
    Huang, Zi
    DATABASES THEORY AND APPLICATIONS, ADC 2020, 2020, 12008 : 93 - 104
  • [44] Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval
    Shen, Xiaobo
    Chen, Yinfan
    Liu, Weiwei
    Zheng, Yuhui
    Sun, Quan-Sen
    Pan, Shirui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [45] Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval
    Yu, Jing
    Lu, Yuhang
    Qin, Zengchang
    Zhang, Weifeng
    Liu, Yanbing
    Tan, Jianlong
    Guo, Li
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 223 - 234
  • [46] Cross-modal fabric image-text retrieval based on convolutional neural network and TinyBERT
    Xiang, Jun
    Zhang, Ning
    Pan, Ruru
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59725 - 59746
  • [47] Cross-Modal Retrieval Using Deep Learning
    Malik, Shaily
    Bhardwaj, Nikhil
    Bhardwaj, Rahul
    Kumar, Saurabh
    PROCEEDINGS OF THIRD DOCTORAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE, DOSCI 2022, 2023, 479 : 725 - 734
  • [48] Cross-Modal Attention-Guided Convolutional Network for Multi-modal Cardiac Segmentation
    Zhou, Ziqi
    Guo, Xinna
    Yang, Wanqi
    Shi, Yinghuan
    Zhou, Luping
    Wang, Lei
    Yang, Ming
    MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2019), 2019, 11861 : 601 - 610
  • [49] Region-based Cross-modal Retrieval
    Hou, Danyang
    Pang, Liang
    Lan, Yanyan
    Shen, Huawei
    Cheng, Xueqi
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [50] Cross-modal retrieval based on shared proxies
    Wei, Yuxin
    Zheng, Ligang
    Qiu, Guoping
    Cai, Guocan
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (01)