MGeo: Multi-Modal Geographic Language Model Pre-Training

被引：1

作者：

Ding, Ruixue ^{[1
]}

Chen, Boli ^{[1
]}

Xie, Pengjun ^{[1
]}

Huang, Fei ^{[1
]}

Li, Xin ^{[2
]}

Zhang, Qiang ^{[2
]}

Xu, Yao ^{[2
]}

机构：

[1] Alibaba Grp, Damo Acad, Hangzhou, Peoples R China

[2] Alibaba Grp, Gaode Map, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

关键词：

query-POI matching; multi-modal; language model; geographic context; benchmark;

D O I：

10.1145/3539618.3591728

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Query and point of interest (POI) matching is a core task in location-based services (LBS), e.g., navigation maps. It connects users' intent with real-world geographic information. Lately, pre-trained language models (PLMs) have made notable advancements in many natural language processing (NLP) tasks. To overcome the limitation that generic PLMs lack geographic knowledge for query-POI matching, related literature attempts to employ continued pre-training based on domain-specific corpus. However, a query generally describes the geographic context (GC) about its destination and contains mentions of multiple geographic objects like nearby roads and regions of interest (ROIs). These diverse geographic objects and their correlations are pivotal to retrieving the most relevant POI. Text-based single-modal PLMs can barely make use of the important GC and are therefore limited. In this work, we propose a novel method for query-POI matching, namely Multi-modal Geographic language model (MGeo), which comprises a geographic encoder and a multi-modal interaction module. Representing GC as a new modality, MGeo is able to fully extract multi-modal correlations to perform accurate query-POI matching. Moreover, there exists no publicly available query-POI matching benchmark. Intending to facilitate further research, we build a newopen-source large-scale benchmark for this topic, i.e., Geographic TExtual Similarity (GeoTES). The POIs come from an open-source geographic information system (GIS) and the queries are manually generated by annotators to prevent privacy issues. Compared with several strong baselines, the extensive experiment results and detailed ablation analyses demonstrate that our proposed multi-modal geographic pre-training method can significantly improve the query-POI matching capability of PLMs with or without users' locations. Our code and benchmark are publicly available at https://github.com/PhantomGrapes/MGeo.

引用

页码：185 / 194

页数：10

共 50 条

[1] Multi-Modal Contrastive Pre-training for Recommendation
Liu, Zhuang
Ma, Yunpu
Schubert, Matthias
Ouyang, Yuanxin
Xiong, Zhang
[J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108
[2] CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis
Zhao, Tianqi
Kong, Ming
Liang, Tian
Zhu, Qiang
Kuang, Kun
Wu, Fei
[J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 622 - 626
[3] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training
Chen, Zhihong
Du, Yuhao
Hu, Jinpeng
Liu, Yang
Li, Guanbin
Wan, Xiang
Chang, Tsung-Hui
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 : 679 - 689
[4] MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION
Chan, David M.
Ghosh, Shalini
Chakrabarty, Debmalya
Hoffmeister, Bjorn
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 246 - 250
[5] TableVLM: Multi-modal Pre-training for Table Structure Recognition
Chen, Leiyuan
Huang, Chengsong
Zheng, Xiaoqing
Lin, Jinshu
Huang, Xuanjing
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2437 - 2449
[6] Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion
Yan, Zhiqiang
Li, Xiang
Wang, Kun
Zhang, Zhenyu
Li, Jun
Yang, Jian
[J]. COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 378 - 395
[7] Versatile Multi-Modal Pre-Training for Human-Centric Perception
Hong, Fangzhou
Pan, Liang
Cai, Zhongang
Liu, Ziwei
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16135 - 16145
[8] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Moon, Jong Hak
Lee, Hyungyung
Shin, Woncheol
Kim, Young-Hak
Choi, Edward
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (12) : 6070 - 6080
[9] Graph-Text Multi-Modal Pre-training for Medical Representation Learning
Park, Sungjin
Bae, Seongsu
Kim, Jiho
Kim, Tackeun
Choi, Edward
[J]. CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 261 - 281
[10] MMPT'21: International JointWorkshop on Multi-Modal Pre-Training for Multimedia Understanding
Liu, Bei
Fu, Jianlong
Chen, Shizhe
Jin, Qin
Hauptmann, Alexander
Rui, Yong
[J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 694 - 695

← 1 2 3 4 5 →