A Dataset for Multilingual Epidemiological Event Extraction

被引:0
|
作者
Mutuvi, Stephen [1 ,2 ]
Doucet, Antoine [1 ]
Lejeune, Gael [3 ]
Odeo, Moses [2 ]
机构
[1] Univ La Rochelle, L3i Lab, La Rochelle, France
[2] Multimedia Univ Kenya, Nairobi, Kenya
[3] Sorbonne Univ Paris, Paris, France
基金
欧盟地平线“2020”;
关键词
Epidemiology; corpus creation; event extraction; classification; multilingual NLP;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper proposes a corpus for the development and evaluation of tools and techniques for identifying emerging infectious disease threats in online news text. The corpus can not only be used for information extraction, but also for other natural language processing (NLP) tasks such as text classification. We make use of articles published on the Program for Monitoring Emerging Diseases (PROMED) platform, which provides current information about outbreaks of infectious diseases globally. Among the key pieces of information present in the articles is the uniform resource locator (URL) to the online news sources where the outbreaks were originally reported. We detail the procedure followed to build the dataset, which includes leveraging the source URLs to retrieve the news reports and subsequently pre-processing the retrieved documents. We also report on experimental results of event extraction on the dataset using the Data Analysis for Information Extraction in any Language( DANIEL) system. DANIEL is a multilingual news surveillance system that leverages unique attributes associated with news reporting to extract events: repetition and saliency. The system has wide geographical and language coverage, including low-resource languages. In addition, we compare different classification approaches in terms of their ability to differentiate between epidemic-related and unrelated news articles that constitute the corpus.
引用
收藏
页码:4139 / 4144
页数:6
相关论文
共 50 条
  • [1] Multilingual Epidemic Event Extraction
    Mutuvi, Stephen
    Boros, Emanuela
    Doucet, Antoine
    Lejeune, Gael
    Jatowt, Adam
    Odeo, Moses
    [J]. TOWARDS OPEN AND TRUSTWORTHY DIGITAL SOCIETIES, ICADL 2021, 2021, 13133 : 139 - 156
  • [2] REDFM: a Filtered and Multilingual Relation Extraction Dataset
    Cabot, Pere-Lluis Huguet
    Tedeschi, Simone
    Ngomo, Axel-Cyrille Ngonga
    Navigli, Roberto
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4326 - 4343
  • [3] Multilingual Entity and Relation Extraction Dataset and Model
    Seganti, Alessandro
    Firlag, Klaudia
    Skowronska, Helena
    Satlawa, Michal
    Andruszkiewicz, Piotr
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1946 - 1955
  • [4] Multilingual event extraction for epidemic detection
    Lejeune, Gael
    Brixtel, Romain
    Doucet, Antoine
    Lucas, Nadine
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2015, 65 (02) : 131 - 143
  • [5] MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset
    Hennig, Leonhard
    Thomas, Philippe
    Moeller, Sebastian
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3785 - 3801
  • [6] MEED: A Multimodal Event Extraction Dataset
    Wang, Shuo
    Zheng, Qiushuo
    Su, Zherong
    Na, Chongning
    Qi, Guilin
    [J]. KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS NEW INFRASTRUCTURE CONSTRUCTION, 2021, 1466 : 288 - 294
  • [7] A Dataset for Open Event Extraction in English
    Nguyen, Kiem-Hieu
    Tannier, Xavier
    Ferret, Olivier
    Besancon, Romaric
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1939 - 1943
  • [8] Multilingual Event Extraction from Historical Newspaper Adverts WARNING: This paper shows dataset samples which are racist in nature
    Borenstein, Nadav
    Perez, Natalia da Silva
    Augenstein, Isabelle
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10304 - 10325
  • [9] MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection
    Ben Veyseh, Amir Pouran
    Minh Van Nguyen
    Dernoncourt, Franck
    Thien Huu Nguyen
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2286 - 2299
  • [10] Multilingual Entity, Relation, Event and Human Value Extraction
    Li, Manling
    Lin, Ying
    Hoover, Joseph
    Whitehead, Spencer
    Voss, Clare R.
    Dehghani, Morteza
    Ji, Heng
    [J]. NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE DEMONSTRATIONS SESSION, 2019, : 110 - 115