Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs

被引:2
|
作者
Pablo Consuegra-Ayala, Juan [1 ]
Gutierrez, Yoan [2 ,3 ]
Piad-Morffis, Alejandro [1 ]
Almeida-Cruz, Yudivian [1 ]
Palomar, Manuel [2 ,3 ]
机构
[1] Univ Habana, Sch Math & Comp Sci, Havana 10200, Cuba
[2] Univ Alicante, Univ Inst Comp Res IUII, Alicante 03690, Spain
[3] Univ Alicante, Dept Language & Comp Syst, Alicante 03690, Spain
关键词
Ensemble methods; Annotated corpora; Information extraction; Entity recognition; Relation extraction; Natural language processing;
D O I
10.1016/j.jbi.2021.103716
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Corpora are one of the most valuable resources at present for building machine learning systems. However, building new corpora is an expensive task, which makes the automatic extension of corpora a highly attractive task to develop. Hence, finding new strategies that reduce the cost and effort involved in this task, while at the same time guaranteeing quality, remains an open and important challenge for the research community. In this paper, we present a set of ensembling strategies oriented toward entity and relation extraction tasks. The main goal is to combine several automatically annotated versions of corpora to produce a single version with improved quality. An ensembler is built by exploring a configuration space in search of the combination that maximizes the fitness of the ensembled collection according to a reference collection. The eHealth-KD 2019 challenge was chosen for the case study. The submitted systems? outputs were ensembled, resulting in the construction of an automatically annotated collection of 8000 sentences. We show that using this collection as additional training input for a baseline algorithm has a positive impact on its performance. Additionally, the ensembling pipeline was used as a participant system in the 2020 edition of the challenge. The ensembled run achieved a slightly better performance than the individual runs.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Ensembling Predictions of Student Knowledge within Intelligent Tutoring Systems
    Baker, Ryan S. J. D.
    Pardos, Zachary A.
    Gowda, Sujith M.
    Nooraei, Bahador B.
    Heffernan, Neil T.
    USER MODELING, ADAPTATION, AND PERSONALIZATION, 2011, 6787 : 13 - +
  • [2] Automatic discovery of translation collocations from bilingual corpora
    Barrachina, S
    Vilar, JM
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 571 - 575
  • [3] Analysis of eHealth knowledge discovery systems in the TASS 2018 Workshop
    Piad-Morffis, Alejandro
    Gutierrez, Yoan
    Estevez-Velarde, Suilan
    Almeida-Cruz, Yudivian
    Montoyo, Andres
    Munoz, Rafael
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2019, (62): : 13 - 20
  • [4] Discovery of event entailment knowledge from text corpora
    Pekar, Viktor
    COMPUTER SPEECH AND LANGUAGE, 2008, 22 (01): : 1 - 16
  • [5] Intelligent Systems for Students Knowledge Automatic Evaluation
    Dobre, Iuliana
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON VIRTUAL LEARNING, 2008, : 327 - 334
  • [6] Automatic Concept Discovery from Parallel Text and Visual Corpora
    Sun, Chen
    Gan, Chuang
    Nevatia, Ram
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2596 - 2604
  • [7] Automatic Visual Theme Discovery from Joint Image and Text Corpora
    Sun, Ke
    Hou, Xianxu
    Zhang, Qian
    Qiu, Guoping
    2017 2ND INTERNATIONAL CONFERENCE ON MULTIMEDIA AND IMAGE PROCESSING (ICMIP), 2017, : 220 - 224
  • [8] Automatic induction of romanization systems from bilingual corpora
    Doshisha University, Kyotanabe-shi
    610-0394, Japan
    不详
    619-0289, Japan
    IEICE Trans Inf Syst, 1600, 2 (381-393):
  • [9] Automatic Induction of Romanization Systems from Bilingual Corpora
    Taguchi, Keiko
    Finch, Andrew
    Yamamoto, Seiichi
    Sumita, Eiichiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (02) : 381 - 393
  • [10] Domain Knowledge in Knowledge Discovery and Privacy-Aware Intelligent Systems Preface
    Slezak, Dominik
    Fung, Benjamin C. M.
    Cheung, William K.
    FUNDAMENTA INFORMATICAE, 2015, 137 (02) : I - II