An annotated corpus of clinical trial publications supporting schema-based relational information extraction

被引：5

作者：

Sanchez-Graillet, Olivia ^{[1
]}

Witte, Christian ^{[1
]}

Grimm, Frank ^{[1
]}

Cimiano, Philipp ^{[1
]}

机构：

[1] Bielefeld Univ, Cluster Excellence Cognit Interact Technol CITEC, Semant Comp Grp, D-33619 Bielefeld, Germany

来源：

JOURNAL OF BIOMEDICAL SEMANTICS | 2022年 / 13卷 / 01期

关键词：

Clinical trial annotated corpus; Schematic annotation; Relational information extraction; Knowledge base population; AGREEMENT;

D O I：

10.1186/s13326-022-00271-7

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background The evidence-based medicine paradigm requires the ability to aggregate and compare outcomes of interventions across different trials. This can be facilitated and partially automatized by information extraction systems. In order to support the development of systems that can extract information from published clinical trials at a fine-grained and comprehensive level to populate a knowledge base, we present a richly annotated corpus at two levels. At the first level, entities that describe components of the PICO elements (e.g., population's age and pre-conditions, dosage of a treatment, etc.) are annotated. The second level comprises schema-level (i.e., slot-filling templates) annotations corresponding to complex PICO elements and other concepts related to a clinical trial (e.g. the relation between an intervention and an arm, the relation between an outcome and an intervention, etc.). Results The final corpus includes 211 annotated clinical trial abstracts with substantial agreement between annotators at the entity and scheme level. The mean Kappa value for the glaucoma and T2DM corpora was 0.74 and 0.68, respectively, for single entities. The micro-averaged F-1 score to measure inter-annotator agreement for complex entities (i.e. slot-filling templates) was 0.81.The BERT-base baseline method for entity recognition achieved average micro- F-1 scores of 0.76 for glaucoma and 0.77 for diabetes with exact matching. Conclusions In this work, we have created a corpus that goes beyond the existing clinical trial corpora, since it is annotated in a schematic way that represents the classes and properties defined in an ontology. Although the corpus is small, it has fine-grained annotations and could be used to fine-tune pre-trained machine learning models and transformers to the specific task of extracting information about clinical trial abstracts.For future work, we will use the corpus for training information extraction systems that extract single entities, and predict template slot-fillers (i.e., class data/object properties) to populate a knowledge base that relies on the C-TrO ontology for the description of clinical trials. The resulting corpus and the code to measure inter-annotation agreement and the baseline method are publicly available at https://zenodo.org/record/6365890.

引用

页数：18

共 50 条

[21] LEI2JSON']JSON: Schema-based validation and conversion of livestock event information
Habib, Mahir
Kabir, Muhammad Ashad
Zheng, Lihong
SOFTWAREX, 2024, 26
[22] Clinical Trial Information Extraction with BERT
Liu, Xiong
Hersch, Greg L.
Khalil, Iya
Devarakonda, Murthy
2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 505 - 506
[23] XML Schema-Based Minification for Communication of Security Information and Event Management (SIEM) Systems in Cloud Environments
Moussa, Bishoy
Mostafa, Mahmoud
El-Khouly, Mahmoud
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (09) : 74 - 82
[24] R2LD: Schema-based Graph Mapping of relational databases to Linked Open Data for multimedia resources data
Zhao, Zhanfang
Han, SungKook
Kim, JuRi
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (20) : 28835 - 28851
[25] R2LD: Schema-based Graph Mapping of relational databases to Linked Open Data for multimedia resources data
Zhanfang Zhao
SungKook Han
JuRi Kim
Multimedia Tools and Applications, 2019, 78 : 28835 - 28851
[26] Information Extraction based on Named Entity for Tourism Corpus
Chantrapornchai, Chantana
Tunsakul, Aphisit
2019 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2019), 2019, : 187 - 192
[27] ExaCT: automatic extraction of clinical trial characteristics from journal publications
Kiritchenko, Svetlana
de Bruijn, Berry
Carini, Simona
Martin, Joel
Sim, Ida
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2010, 10
[28] Supporting the Abstraction of Clinical Practice Guidelines Using Information Extraction
Kaiser, Katharina
Miksch, Silvia
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 6177 : 304 - +
[29] ExaCT: automatic extraction of clinical trial characteristics from journal publications
Svetlana Kiritchenko
Berry de Bruijn
Simona Carini
Joel Martin
Ida Sim
BMC Medical Informatics and Decision Making, 10
[30] Information retrieval in schema-based P2P systems using one-dimensional semantic space
Gu, Tao
Pung, Hung Keng
Zhang, Daqing
COMPUTER NETWORKS, 2007, 51 (16) : 4543 - 4560

← 1 2 3 4 5 →