MGeo: Multi-Modal Geographic Language Model Pre-Training

被引:1
|
作者
Ding, Ruixue [1 ]
Chen, Boli [1 ]
Xie, Pengjun [1 ]
Huang, Fei [1 ]
Li, Xin [2 ]
Zhang, Qiang [2 ]
Xu, Yao [2 ]
机构
[1] Alibaba Grp, Damo Acad, Hangzhou, Peoples R China
[2] Alibaba Grp, Gaode Map, Beijing, Peoples R China
关键词
query-POI matching; multi-modal; language model; geographic context; benchmark;
D O I
10.1145/3539618.3591728
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Query and point of interest (POI) matching is a core task in location-based services (LBS), e.g., navigation maps. It connects users' intent with real-world geographic information. Lately, pre-trained language models (PLMs) have made notable advancements in many natural language processing (NLP) tasks. To overcome the limitation that generic PLMs lack geographic knowledge for query-POI matching, related literature attempts to employ continued pre-training based on domain-specific corpus. However, a query generally describes the geographic context (GC) about its destination and contains mentions of multiple geographic objects like nearby roads and regions of interest (ROIs). These diverse geographic objects and their correlations are pivotal to retrieving the most relevant POI. Text-based single-modal PLMs can barely make use of the important GC and are therefore limited. In this work, we propose a novel method for query-POI matching, namely Multi-modal Geographic language model (MGeo), which comprises a geographic encoder and a multi-modal interaction module. Representing GC as a new modality, MGeo is able to fully extract multi-modal correlations to perform accurate query-POI matching. Moreover, there exists no publicly available query-POI matching benchmark. Intending to facilitate further research, we build a newopen-source large-scale benchmark for this topic, i.e., Geographic TExtual Similarity (GeoTES). The POIs come from an open-source geographic information system (GIS) and the queries are manually generated by annotators to prevent privacy issues. Compared with several strong baselines, the extensive experiment results and detailed ablation analyses demonstrate that our proposed multi-modal geographic pre-training method can significantly improve the query-POI matching capability of PLMs with or without users' locations. Our code and benchmark are publicly available at https://github.com/PhantomGrapes/MGeo.
引用
收藏
页码:185 / 194
页数:10
相关论文
共 50 条
  • [1] Multi-Modal Contrastive Pre-training for Recommendation
    Liu, Zhuang
    Ma, Yunpu
    Schubert, Matthias
    Ouyang, Yuanxin
    Xiong, Zhang
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108
  • [2] CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis
    Zhao, Tianqi
    Kong, Ming
    Liang, Tian
    Zhu, Qiang
    Kuang, Kun
    Wu, Fei
    [J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 622 - 626
  • [3] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training
    Chen, Zhihong
    Du, Yuhao
    Hu, Jinpeng
    Liu, Yang
    Li, Guanbin
    Wan, Xiang
    Chang, Tsung-Hui
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 : 679 - 689
  • [4] MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION
    Chan, David M.
    Ghosh, Shalini
    Chakrabarty, Debmalya
    Hoffmeister, Bjorn
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 246 - 250
  • [5] TableVLM: Multi-modal Pre-training for Table Structure Recognition
    Chen, Leiyuan
    Huang, Chengsong
    Zheng, Xiaoqing
    Lin, Jinshu
    Huang, Xuanjing
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2437 - 2449
  • [6] Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion
    Yan, Zhiqiang
    Li, Xiang
    Wang, Kun
    Zhang, Zhenyu
    Li, Jun
    Yang, Jian
    [J]. COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 378 - 395
  • [7] Versatile Multi-Modal Pre-Training for Human-Centric Perception
    Hong, Fangzhou
    Pan, Liang
    Cai, Zhongang
    Liu, Ziwei
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16135 - 16145
  • [8] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
    Moon, Jong Hak
    Lee, Hyungyung
    Shin, Woncheol
    Kim, Young-Hak
    Choi, Edward
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (12) : 6070 - 6080
  • [9] Graph-Text Multi-Modal Pre-training for Medical Representation Learning
    Park, Sungjin
    Bae, Seongsu
    Kim, Jiho
    Kim, Tackeun
    Choi, Edward
    [J]. CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 261 - 281
  • [10] MMPT'21: International JointWorkshop on Multi-Modal Pre-Training for Multimedia Understanding
    Liu, Bei
    Fu, Jianlong
    Chen, Shizhe
    Jin, Qin
    Hauptmann, Alexander
    Rui, Yong
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 694 - 695