Automatic extension of corpora from the intelligent ensembling of eHealth knowledge discovery systems outputs

被引:2
|
作者
Pablo Consuegra-Ayala, Juan [1 ]
Gutierrez, Yoan [2 ,3 ]
Piad-Morffis, Alejandro [1 ]
Almeida-Cruz, Yudivian [1 ]
Palomar, Manuel [2 ,3 ]
机构
[1] Univ Habana, Sch Math & Comp Sci, Havana 10200, Cuba
[2] Univ Alicante, Univ Inst Comp Res IUII, Alicante 03690, Spain
[3] Univ Alicante, Dept Language & Comp Syst, Alicante 03690, Spain
关键词
Ensemble methods; Annotated corpora; Information extraction; Entity recognition; Relation extraction; Natural language processing;
D O I
10.1016/j.jbi.2021.103716
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Corpora are one of the most valuable resources at present for building machine learning systems. However, building new corpora is an expensive task, which makes the automatic extension of corpora a highly attractive task to develop. Hence, finding new strategies that reduce the cost and effort involved in this task, while at the same time guaranteeing quality, remains an open and important challenge for the research community. In this paper, we present a set of ensembling strategies oriented toward entity and relation extraction tasks. The main goal is to combine several automatically annotated versions of corpora to produce a single version with improved quality. An ensembler is built by exploring a configuration space in search of the combination that maximizes the fitness of the ensembled collection according to a reference collection. The eHealth-KD 2019 challenge was chosen for the case study. The submitted systems? outputs were ensembled, resulting in the construction of an automatically annotated collection of 8000 sentences. We show that using this collection as additional training input for a baseline algorithm has a positive impact on its performance. Additionally, the ensembling pipeline was used as a participant system in the 2020 edition of the challenge. The ensembled run achieved a slightly better performance than the individual runs.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Theory and support for process frameworks of knowledge discovery and data mining from ERP systems
    Bendoly, E
    INFORMATION & MANAGEMENT, 2003, 40 (07) : 639 - 647
  • [42] Knowledge Discovery from Heterogeneous Dynamic Systems using Change-Point Correlations
    Ide, Tsuyoshi
    Inoue, Keisuke
    PROCEEDINGS OF THE FIFTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2005, : 571 - 575
  • [43] Evolving connectionist systems for knowledge discovery from gene expression data of cancer tissue
    Futschik, ME
    Reeve, A
    Kasabov, N
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2003, 28 (02) : 165 - 189
  • [44] Automatic semantic object discovery and mapping from non-normalised relational database systems
    Karkalas, S
    Martin, N
    ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, 2000, 1909 : 92 - 107
  • [45] gPROFIT: A Tool to Assist the Automatic Extraction of Business Knowledge From Legacy Information Systems
    Garcia-Garcia, Julian A.
    Maldonado, C. Arevalo
    Meidan, Ayman
    Morillo-Baro, Esteban
    Escalona, Maria Jose
    IEEE ACCESS, 2021, 9 : 94934 - 94952
  • [46] Using intelligent ontology technology to extract knowledge from successful project in IoT enterprise systems
    Ding, Jinfeng
    Tang, TianRan
    Zhang, Yaqin
    Chi, Wi
    ENTERPRISE INFORMATION SYSTEMS, 2022, 16 (07)
  • [47] Fuzzy-Based Knowledge Discovery from Heterogeneous Data in Planting Systems for Elderly LOHAS
    Hung-Chih Hsueh
    Jung-Yi Jiang
    Jen-Sheng Tsai
    Wen-Hao Tsai
    Kuan-Rong Lee
    Yau-Hwang Kuo
    Journal of Electronic Science and Technology, 2015, 13 (01) : 45 - 53
  • [48] A quality model for the evaluation of decision support systems based on a knowledge discovery from data process
    Ben Ayed, Emna
    Ben Ayed, Mounir
    JOURNAL OF DECISION SYSTEMS, 2016, 25 (02) : 95 - 117
  • [49] Building intelligent systems for mining information extraction rules from Web pages by using domain knowledge
    Seo, H
    Yang, J
    Choi, J
    ISIE 2001: IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS PROCEEDINGS, VOLS I-III, 2001, : 322 - 327
  • [50] The microcomputer as an intelligent interface between laboratory instruments and data systems: Automatic data collection from a grain analyzer
    Plattner, R.D.
    Simpson, T.D.
    Butterfield, R.O.
    Chemical, biomedical, and environmental instrumentation, 1980, 10 (03): : 331 - 338