Domain adaptation for semantic role labeling of clinical text

被引:14
|
作者
Zhang, Yaoyun [1 ]
Tang, Buzhou [1 ,2 ]
Jiang, Min [1 ]
Wang, Jingqi [1 ]
Xu, Hua [1 ]
机构
[1] Univ Texas Houston, Sch Biomed Informat Houston, Houston, TX 77030 USA
[2] Shenzhen Grad Sch, Harbin Inst Technol, Dept Comp Sci, Shenzhen, Guangdong, Peoples R China
关键词
semantic role labeling; shallow semantic parsing; clinical natural language processing; domain adaptation; transfer learning; BIOMEDICAL LITERATURE; ANNOTATED CORPUS; INFORMATION; EXTRACTION; KNOWLEDGE; SYSTEM;
D O I
10.1093/jamia/ocu048
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Semantic role labeling (SRL), which extracts a shallow semantic relation representation from different surface textual forms of free text sentences, is important for understanding natural language. Few studies in SRL have been conducted in the medical domain, primarily due to lack of annotated clinical SRL corpora, which are time-consuming and costly to build. The goal of this study is to investigate domain adaptation techniques for clinical SRL leveraging resources built from newswire and biomedical literature to improve performance and save annotation costs. Materials and Methods Multisource Integrated Platform for Answering Clinical Questions (MiPACQ), a manually annotated SRL clinical corpus, was used as the target domain dataset. PropBank and NomBank from newswire and BioProp from biomedical literature were used as source domain datasets. Three state-of-the-art domain adaptation algorithms were employed: instance pruning, transfer self-training, and feature augmentation. The SRL performance using different domain adaptation algorithms was evaluated by using 10-fold cross-validation on the MiPACQ corpus. Learning curves for the different methods were generated to assess the effect of sample size. Results and Conclusion When all three source domain corpora were used, the feature augmentation algorithm achieved statistically significant higher F-measure (83.18%), compared to the baseline with MiPACQ dataset alone (F-measure, 81.53%), indicating that domain adaptation algorithms may improve SRL performance on clinical text. To achieve a comparable performance to the baseline method that used 90% of MiPACQ training samples, the feature augmentation algorithm required < 50% of training samples in MiPACQ, demonstrating that annotation costs of clinical SRL can be reduced significantly by leveraging existing SRL resources from other domains.
引用
收藏
页码:967 / 979
页数:13
相关论文
共 50 条
  • [41] Semantic Role Labeling of English Tweets
    Rudrapal, Dwijen
    Das, Amitava
    COMPUTACION Y SISTEMAS, 2018, 22 (03): : 739 - 746
  • [42] Adaptive Convolution for Semantic Role Labeling
    Munir, Kashif
    Zhao, Hai
    Li, Zuchao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 782 - 791
  • [43] Combination strategies for semantic role labeling
    Surdeanu, Mihai
    Marquez, Lluis
    Carreras, Xavier
    Comas, Pere R.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2007, 29 : 105 - 151
  • [44] Structured Learning for Semantic Role Labeling
    Croce, Danilo
    Basili, Roberto
    AI(STAR)IA 2011: ARTIFICIAL INTELLIGENCE AROUND MAN AND BEYOND, 2011, 6934 : 238 - 249
  • [45] Research on Semantic Role Labeling Method
    Jiang, Bo
    Lan, Yuqing
    COMMUNICATIONS AND NETWORKING, CHINACOM 2018, 2019, 262 : 252 - 258
  • [46] Tree kernels for semantic role labeling
    Moschitti, Alessandro
    Pighin, Daniele
    Basili, Roberto
    COMPUTATIONAL LINGUISTICS, 2008, 34 (02) : 193 - 224
  • [47] Semantic Proto-Role Labeling
    Teichert, Adam
    Poliak, Adam
    Van Durme, Benjamin
    Gormley, Matthew R.
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4459 - 4465
  • [48] Combination strategies for semantic role labeling
    Surdeanu, Mihai
    Marquez, Lluís
    Carreras, Xavier
    Comas, Pere R.
    Journal of Artificial Intelligence Research, 1600, 29 : 105 - 151
  • [49] A generative model for semantic role labeling
    Thompson, CA
    Levy, R
    Manning, CD
    MACHINE LEARNING: ECML 2003, 2003, 2837 : 397 - 408
  • [50] Structured learning for semantic role labeling
    Croce, Danilo
    Castellucci, Giuseppe
    Bastianelli, Emanuele
    INTELLIGENZA ARTIFICIALE, 2012, 6 (02) : 163 - 176