IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets

被引:6
|
作者
Suwaileh, Reem [1 ]
Elsayed, Tamer [1 ]
Imran, Muhammad [2 ]
机构
[1] Qatar Univ, Comp Sci & Engn Dept, Doha, Qatar
[2] Hamad Bin Khalifa Univ HBKU, Qatar Comp Res Inst QCRI, Ar Rayyan, Qatar
关键词
Location mention recognition; Twitter; Geolocation; Disaster management; Dataset; Domain generalizability; Geographical generalizability; PREDICTION; EXTRACTION;
D O I
10.1016/j.ipm.2023.103340
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While utilizing Twitter data for crisis management is of interest to different response authorities, a critical challenge that hinders the utilization of such data is the scarcity of automated tools that extract geolocation information. The limited focus on Location Mention Recognition (LMR) in tweets, specifically, is attributed to the lack of a standard dataset that enables research in LMR. To bridge this gap, we present IDRISI-RE, a large-scale human-labeled LMR dataset comprising around 20.5k tweets. The annotated location mentions within the tweets are also assigned location types (e.g., country, city, street, etc.). IDRISI-RE contains tweets from 19 disaster events of diverse types (e.g., flood and earthquake) covering a wide geographical area of 22 English-speaking countries. Additionally, IDRISI-RE contains about 56.6k automatically-labeled tweets that we offer as a silver dataset. To highlight the superiority of IDRISI-RE over past efforts, we present rigorous analyses on reliability, consistency, coverage, diversity, and generalizability. Furthermore, we benchmark IDRISI-RE using a representative set of LMR models to provide the community with baselines for future work. Our extensive empirical analysis shows the promising generalizability of IDRISI-RE compared to existing datasets. We show that models trained on IDRISI-RE better tackle domain shifts and are less susceptible to change in geographical areas.
引用
收藏
页数:29
相关论文
共 4 条
  • [1] IDRISI-RA: The First Arabic Location Mention Recognition Dataset of Disaster Tweets
    Suwaileh, Reem
    Imran, Muhammad
    Elsayed, Tamer
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 16298 - 16317
  • [2] Location Mention Recognition from Japanese Disaster-Related Tweets
    Rokuse, Toshihiro
    Uchida, Osamu
    [J]. IFIP Advances in Information and Communication Technology, 2023, 672 LNBIP : 293 - 307
  • [3] When a disaster happens, we are ready: Location mention recognition from crisis tweets*
    Suwaileh, Reem
    Elsayed, Tamer
    Imran, Muhammad
    Sajjad, Hassan
    [J]. INTERNATIONAL JOURNAL OF DISASTER RISK REDUCTION, 2022, 78
  • [4] Applying social media in emergency response: an attention-based bidirectional deep learning system for location reference recognition in disaster tweets
    Koshy, Rani
    Elango, Sivasankar
    [J]. APPLIED INTELLIGENCE, 2024, 54 (07) : 5768 - 5793