Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs

被引:2
|
作者
Pablo Consuegra-Ayala, Juan [1 ]
Gutierrez, Yoan [2 ,3 ]
Piad-Morffis, Alejandro [1 ]
Almeida-Cruz, Yudivian [1 ]
Palomar, Manuel [2 ,3 ]
机构
[1] Univ Habana, Sch Math & Comp Sci, Havana 10200, Cuba
[2] Univ Alicante, Univ Inst Comp Res IUII, Alicante 03690, Spain
[3] Univ Alicante, Dept Language & Comp Syst, Alicante 03690, Spain
关键词
Ensemble methods; Annotated corpora; Information extraction; Entity recognition; Relation extraction; Natural language processing;
D O I
10.1016/j.jbi.2021.103716
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Corpora are one of the most valuable resources at present for building machine learning systems. However, building new corpora is an expensive task, which makes the automatic extension of corpora a highly attractive task to develop. Hence, finding new strategies that reduce the cost and effort involved in this task, while at the same time guaranteeing quality, remains an open and important challenge for the research community. In this paper, we present a set of ensembling strategies oriented toward entity and relation extraction tasks. The main goal is to combine several automatically annotated versions of corpora to produce a single version with improved quality. An ensembler is built by exploring a configuration space in search of the combination that maximizes the fitness of the ensembled collection according to a reference collection. The eHealth-KD 2019 challenge was chosen for the case study. The submitted systems? outputs were ensembled, resulting in the construction of an automatically annotated collection of 8000 sentences. We show that using this collection as additional training input for a baseline algorithm has a positive impact on its performance. Additionally, the ensembling pipeline was used as a participant system in the 2020 edition of the challenge. The ensembled run achieved a slightly better performance than the individual runs.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] A knowledge and intelligent-based strategy for resource discovery on IaaS cloud systems
    Gharajeh, Mohammad Samadi
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2021, 12 (02) : 205 - 221
  • [22] Chronological corpora curve clustering: From scientific corpora construction to knowledge dynamics discovery through word life-cycles clustering
    Trevisani, Matilde
    Tuzzi, Arjuna
    METHODSX, 2018, 5 : 1576 - 1587
  • [23] Intelligent mobile agents for information retrieval and knowledge discovery from distributed data and knowledge sources
    Yang, J
    Honavar, V
    Miller, L
    Wong, J
    1998 IEEE INFORMATION TECHNOLOGY CONFERENCE, PROCEEDINGS, 1998, : 99 - 102
  • [24] Applications in intelligent systems of knowledge discovery methods based on human-machine interaction
    Jotsov, Vladimir
    Sgurev, Vassil
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2008, 23 (05) : 588 - 606
  • [25] Knowledge intensive Petri net framework for concurrent intelligent design of automatic assembly systems
    Zha, XF
    Du, H
    Lim, YE
    ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2001, 17 (05) : 379 - 398
  • [26] Toward truly intelligent information systems - from expert systems to automatic programming
    Ohsuga, S
    KNOWLEDGE-BASED SYSTEMS, 1998, 10 (06) : 363 - 396
  • [27] Automatic extension of Korean predicate-based sub-categorization dictionary from sense tagged corpora
    Choo, K
    Kang, SK
    Min, HK
    Woo, Y
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2004, PT 3, 2004, 3045 : 585 - 592
  • [28] Leveraging intelligent agents for knowledge discovery from heterogeneous healthcare data repositories
    Zaidi, SZH
    Abidi, SSR
    Manickam, S
    HEALTH DATA IN THE INFORMATION SOCIETY, 2002, 90 : 335 - 340
  • [29] KNOWLEDGE DISCOVERY FROM ROAD TRAFFIC ACCIDENT DATA IN ETHIOPIA: DATA QUALITY, ENSEMBLING AND TREND ANALYSIS FOR IMPROVING ROAD SAFETY
    Beshah, Tibebe
    Ejigu, Dejene
    Abraham, Ajith
    Kroemer, Pavel
    Snasel, Vaclav
    NEURAL NETWORK WORLD, 2012, 22 (03) : 215 - 244
  • [30] Automatic Symptom Extraction from Texts to Enhance Knowledge Discovery on Rare Diseases
    Metivier, Jean-Philippe
    Serrano, Laurie
    Charnois, Thierry
    Cuissart, Bertrand
    Widloecher, Antoine
    ARTIFICIAL INTELLIGENCE IN MEDICINE (AIME 2015), 2015, 9105 : 249 - 254