A pragmatic guide to geoparsing evaluationToponyms, Named Entity Recognition and pragmatics

被引:0
|
作者
Milan Gritta
Mohammad Taher Pilehvar
Nigel Collier
机构
[1] University of Cambridge,Language Technology Lab (LTL), Department of Theoretical and Applied Linguistics (DTAL)
来源
关键词
Geoparsing; Toponym resolution; Geotagging; Geocoding; Named Entity Recognition; Machine learning; Evaluation framework; Geonames; Toponyms; Natural language understanding; Pragmatics;
D O I
暂无
中图分类号
学科分类号
摘要
Empirical methods in geoparsing have thus far lacked a standard evaluation framework describing the task, metrics and data used to compare state-of-the-art systems. Evaluation is further made inconsistent, even unrepresentative of real world usage by the lack of distinction between the different types of toponyms, which necessitates new guidelines, a consolidation of metrics and a detailed toponym taxonomy with implications for Named Entity Recognition (NER) and beyond. To address these deficiencies, our manuscript introduces a new framework in three parts. (Part 1) Task Definition: clarified via corpus linguistic analysis proposing a fine-grained Pragmatic Taxonomy of Toponyms. (Part 2) Metrics: discussed and reviewed for a rigorous evaluation including recommendations for NER/Geoparsing practitioners. (Part 3) Evaluation data: shared via a new dataset called GeoWebNews to provide test/train examples and enable immediate use of our contributions. In addition to fine-grained Geotagging and Toponym Resolution (Geocoding), this dataset is also suitable for prototyping and evaluating machine learning NLP models.
引用
收藏
页码:683 / 712
页数:29
相关论文
共 50 条
  • [1] A pragmatic guide to geoparsing evaluation Toponyms, Named Entity Recognition and pragmatics
    Gritta, Milan
    Pilehvar, Mohammad Taher
    Collier, Nigel
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (03) : 683 - 712
  • [2] Chinese word segmentation and named entity recognition: A pragmatic approach
    Gao, JF
    Li, M
    Wu, A
    Huang, CN
    COMPUTATIONAL LINGUISTICS, 2005, 31 (04) : 531 - 574
  • [3] Named Entity Recognition for Vietnamese
    Dat Ba Nguyen
    Son Huu Hoang
    Son Bao Pham
    Thai Phuong Nguyen
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, PROCEEDINGS, 2010, 5991 : 205 - 214
  • [4] Named Entity Recognition for Tweets
    Liu, Xiaohua
    Wei, Furu
    Zhang, Shaodian
    Zhou, Ming
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2013, 4 (01)
  • [5] Persian Named Entity Recognition
    Dashtipour, Kia
    Gogate, Mandar
    Adeel, Ahsan
    Algarafi, Abdulrahman
    Howard, Newton
    Hussain, Amir
    2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 79 - 83
  • [6] An Overview of Named Entity Recognition
    Sun, Peng
    Yang, Xuezhen
    Zhao, Xiaobing
    Wang, Zhijuan
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 273 - 278
  • [7] Named Entity Recognition Approaches
    Mansouri, Alireza
    Affendey, Lilly Suriani
    Mamat, Ali
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (02): : 339 - 344
  • [8] NAMED ENTITY RECOGNITION FOR POLISH
    Marcinczuk, Michal
    Wawer, Aleksander
    POZNAN STUDIES IN CONTEMPORARY LINGUISTICS, 2019, 55 (02): : 239 - 269
  • [9] NAMED ENTITY RECOGNITION FOR ROMANIAN
    Iftene, Adrian
    Trandabat, Diana
    Toader, Mihai
    Corici, Marius
    KEPT 2011: KNOWLEDGE ENGINEERING PRINCIPLES AND TECHNIQUES, 2011, : 49 - 60
  • [10] Arabic Named Entity Recognition
    Benajiba, Yassine
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (44): : 151 - 152