Geoparsing: Solved or Biased? An Evaluation of Geographic Biases in Geoparsing

被引:6
|
作者
Liu, Zilong [1 ,2 ]
Janowicz, Krzysztof [1 ,2 ,3 ]
Cai, Ling [1 ,2 ]
Zhu, Rui [1 ,2 ]
Mai, Gengchen [1 ,2 ,4 ]
Shi, Meilin [1 ,2 ]
机构
[1] Univ Calif Santa Barbara, Dept Geog, STKO Lab, Santa Barbara, CA 93106 USA
[2] Univ Calif Santa Barbara, Ctr Spatial Studies, Santa Barbara, CA 93106 USA
[3] Univ Vienna, Dept Geog & Reg Res, Vienna, Austria
[4] Stanford Univ, Dept Comp Sci, Stanford, CA USA
关键词
geoparsing; spatially-explicit evaluation; regional variability; geographic bias; evaluation bias mitigation; AREAL UNIT PROBLEM; SPATIAL AUTOCORRELATION; SAMPLING BIAS; MODELS; CONSERVATION; INFORMATION; DATABASE;
D O I
10.5194/agile-giss-3-9-2022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Geoparsing, the task of extracting toponyms from texts and associating them with geographic locations, has witnessed remarkable progress over the past years. However, despite its intrinsically geospatial nature, existing evaluations tend to focus on overall performance while paying little attention to its variation across geographic space. In this work, we attempt to answer the question whether geoparsing is solved or biased by conducting a spatially-explicit evaluation, namely an evaluation of the regional variability in geoparsing performance. Particularly, we will analyze the spatial autocorrelation underlying this regional variability. By performing hot and cold spot detection over results of several open-source geoparsers, we observe that none of them performs equally well across geographic space, and some are geographically biased towards some regions but against others. We also carry out a comparative experiment showing that state-of-the-art geoparsers developed with neural networks do not necessarily outperform the off-the-shelf tools across geographic space. To understand the implications behind this observed regional variability, we evaluate geographic biases involved in geoparsing research centered around data contribution and usage, algorithm design, and performance evaluations. Particularly, our spatially-explicit performance evaluation serves as an approach to evaluation bias mitigation in geoparsing. We conclude that previous performance evaluations published in the literature are overly optimistic, thus hiding the fact that geoparsing is far from solved, and geoparsers require debiasing in addition to further considerations when being applied to (geospatial) downstream tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] GeoParsing Web Queries
    Guillen, Rocio
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 781 - 785
  • [2] An algorithm for local geoparsing of microtext
    Judith Gelernter
    Shilpa Balaji
    GeoInformatica, 2013, 17 : 635 - 667
  • [3] An algorithm for local geoparsing of microtext
    Gelernter, Judith
    Balaji, Shilpa
    GEOINFORMATICA, 2013, 17 (04) : 635 - 667
  • [4] Geoparsing Early Modern English Drama
    Andrea, Bernadette
    JOURNAL FOR EARLY MODERN CULTURAL STUDIES, 2018, 18 (04) : 155 - 161
  • [5] Customising Geoparsing and Georeferencing for Historical Texts
    Rupp, C. J.
    Rayson, Paul
    Baron, Alistair
    Donaldson, Christopher
    Gregory, Ian
    Hardie, Andrew
    Murrieta-Flores, Patricia
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [6] A pragmatic guide to geoparsing evaluation Toponyms, Named Entity Recognition and pragmatics
    Gritta, Milan
    Pilehvar, Mohammad Taher
    Collier, Nigel
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (03) : 683 - 712
  • [7] EventMapping: Geoparsing and Geocoding of Twitter Messages in the Greek Language
    Razis, Gerasimos
    Maroufidis, Ioannis
    Anagnostopoulos, Ioannis
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS. AIAI 2023 IFIP WG 12.5 INTERNATIONAL WORKSHOPS, 2023, 677 : 312 - 324
  • [8] Location Identification for Crime & Disaster Events by Geoparsing Twitter
    Dhavase, Nikhil
    Bagade, A. M.
    2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,
  • [9] Dealing with Heterogeneous Big Data When Geoparsing Historical Corpora
    Rupp, C. J.
    Rayson, Paul
    Gregory, Ian
    Hardie, Andrew
    Joulain, Amelia
    Hartmann, Daniel
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [10] A pragmatic guide to geoparsing evaluationToponyms, Named Entity Recognition and pragmatics
    Milan Gritta
    Mohammad Taher Pilehvar
    Nigel Collier
    Language Resources and Evaluation, 2020, 54 : 683 - 712