An annotated corpus of clinical trial publications supporting schema-based relational information extraction

被引:5
|
作者
Sanchez-Graillet, Olivia [1 ]
Witte, Christian [1 ]
Grimm, Frank [1 ]
Cimiano, Philipp [1 ]
机构
[1] Bielefeld Univ, Cluster Excellence Cognit Interact Technol CITEC, Semant Comp Grp, D-33619 Bielefeld, Germany
关键词
Clinical trial annotated corpus; Schematic annotation; Relational information extraction; Knowledge base population; AGREEMENT;
D O I
10.1186/s13326-022-00271-7
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background The evidence-based medicine paradigm requires the ability to aggregate and compare outcomes of interventions across different trials. This can be facilitated and partially automatized by information extraction systems. In order to support the development of systems that can extract information from published clinical trials at a fine-grained and comprehensive level to populate a knowledge base, we present a richly annotated corpus at two levels. At the first level, entities that describe components of the PICO elements (e.g., population's age and pre-conditions, dosage of a treatment, etc.) are annotated. The second level comprises schema-level (i.e., slot-filling templates) annotations corresponding to complex PICO elements and other concepts related to a clinical trial (e.g. the relation between an intervention and an arm, the relation between an outcome and an intervention, etc.). Results The final corpus includes 211 annotated clinical trial abstracts with substantial agreement between annotators at the entity and scheme level. The mean Kappa value for the glaucoma and T2DM corpora was 0.74 and 0.68, respectively, for single entities. The micro-averaged F-1 score to measure inter-annotator agreement for complex entities (i.e. slot-filling templates) was 0.81.The BERT-base baseline method for entity recognition achieved average micro- F-1 scores of 0.76 for glaucoma and 0.77 for diabetes with exact matching. Conclusions In this work, we have created a corpus that goes beyond the existing clinical trial corpora, since it is annotated in a schematic way that represents the classes and properties defined in an ontology. Although the corpus is small, it has fine-grained annotations and could be used to fine-tune pre-trained machine learning models and transformers to the specific task of extracting information about clinical trial abstracts.For future work, we will use the corpus for training information extraction systems that extract single entities, and predict template slot-fillers (i.e., class data/object properties) to populate a knowledge base that relies on the C-TrO ontology for the description of clinical trials. The resulting corpus and the code to measure inter-annotation agreement and the baseline method are publicly available at https://zenodo.org/record/6365890.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] LEI2JSON']JSON: Schema-based validation and conversion of livestock event information
    Habib, Mahir
    Kabir, Muhammad Ashad
    Zheng, Lihong
    SOFTWAREX, 2024, 26
  • [22] Clinical Trial Information Extraction with BERT
    Liu, Xiong
    Hersch, Greg L.
    Khalil, Iya
    Devarakonda, Murthy
    2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 505 - 506
  • [23] XML Schema-Based Minification for Communication of Security Information and Event Management (SIEM) Systems in Cloud Environments
    Moussa, Bishoy
    Mostafa, Mahmoud
    El-Khouly, Mahmoud
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (09) : 74 - 82
  • [24] R2LD: Schema-based Graph Mapping of relational databases to Linked Open Data for multimedia resources data
    Zhao, Zhanfang
    Han, SungKook
    Kim, JuRi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (20) : 28835 - 28851
  • [25] R2LD: Schema-based Graph Mapping of relational databases to Linked Open Data for multimedia resources data
    Zhanfang Zhao
    SungKook Han
    JuRi Kim
    Multimedia Tools and Applications, 2019, 78 : 28835 - 28851
  • [26] Information Extraction based on Named Entity for Tourism Corpus
    Chantrapornchai, Chantana
    Tunsakul, Aphisit
    2019 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2019), 2019, : 187 - 192
  • [27] ExaCT: automatic extraction of clinical trial characteristics from journal publications
    Kiritchenko, Svetlana
    de Bruijn, Berry
    Carini, Simona
    Martin, Joel
    Sim, Ida
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2010, 10
  • [28] Supporting the Abstraction of Clinical Practice Guidelines Using Information Extraction
    Kaiser, Katharina
    Miksch, Silvia
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 6177 : 304 - +
  • [29] ExaCT: automatic extraction of clinical trial characteristics from journal publications
    Svetlana Kiritchenko
    Berry de Bruijn
    Simona Carini
    Joel Martin
    Ida Sim
    BMC Medical Informatics and Decision Making, 10
  • [30] Information retrieval in schema-based P2P systems using one-dimensional semantic space
    Gu, Tao
    Pung, Hung Keng
    Zhang, Daqing
    COMPUTER NETWORKS, 2007, 51 (16) : 4543 - 4560