Domain adaptation for semantic role labeling of clinical text

被引:14
|
作者
Zhang, Yaoyun [1 ]
Tang, Buzhou [1 ,2 ]
Jiang, Min [1 ]
Wang, Jingqi [1 ]
Xu, Hua [1 ]
机构
[1] Univ Texas Houston, Sch Biomed Informat Houston, Houston, TX 77030 USA
[2] Shenzhen Grad Sch, Harbin Inst Technol, Dept Comp Sci, Shenzhen, Guangdong, Peoples R China
关键词
semantic role labeling; shallow semantic parsing; clinical natural language processing; domain adaptation; transfer learning; BIOMEDICAL LITERATURE; ANNOTATED CORPUS; INFORMATION; EXTRACTION; KNOWLEDGE; SYSTEM;
D O I
10.1093/jamia/ocu048
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Semantic role labeling (SRL), which extracts a shallow semantic relation representation from different surface textual forms of free text sentences, is important for understanding natural language. Few studies in SRL have been conducted in the medical domain, primarily due to lack of annotated clinical SRL corpora, which are time-consuming and costly to build. The goal of this study is to investigate domain adaptation techniques for clinical SRL leveraging resources built from newswire and biomedical literature to improve performance and save annotation costs. Materials and Methods Multisource Integrated Platform for Answering Clinical Questions (MiPACQ), a manually annotated SRL clinical corpus, was used as the target domain dataset. PropBank and NomBank from newswire and BioProp from biomedical literature were used as source domain datasets. Three state-of-the-art domain adaptation algorithms were employed: instance pruning, transfer self-training, and feature augmentation. The SRL performance using different domain adaptation algorithms was evaluated by using 10-fold cross-validation on the MiPACQ corpus. Learning curves for the different methods were generated to assess the effect of sample size. Results and Conclusion When all three source domain corpora were used, the feature augmentation algorithm achieved statistically significant higher F-measure (83.18%), compared to the baseline with MiPACQ dataset alone (F-measure, 81.53%), indicating that domain adaptation algorithms may improve SRL performance on clinical text. To achieve a comparable performance to the baseline method that used 90% of MiPACQ training samples, the feature augmentation algorithm required < 50% of training samples in MiPACQ, demonstrating that annotation costs of clinical SRL can be reduced significantly by leveraging existing SRL resources from other domains.
引用
收藏
页码:967 / 979
页数:13
相关论文
共 50 条
  • [31] Partial Domain Adaptation on Semantic Segmentation
    Tian, Yingjie
    Zhu, Siyu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3798 - 3809
  • [32] Benchmarking domain adaptation for semantic segmentation
    Ahmed, Masud
    Hasan, Zahid
    Khan, Naima
    Roy, Nirmalya
    Purushotham, Sanjay
    Gangopadhyay, Aryya
    You, Suya
    UNMANNED SYSTEMS TECHNOLOGY XXIV, 2022, 12124
  • [33] Transferable Semantic Augmentation for Domain Adaptation
    Li, Shuang
    Xie, Mixue
    Gong, Kaixiong
    Liu, Chi Harold
    Wang, Yulin
    Li, Wei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11511 - 11520
  • [34] Domain Adaptation in Nuclei Semantic Segmentation
    Li, Dawei
    Shi, Zongxuan
    Zhang, Hao
    Zhang, Renhao
    INTERNATIONAL CONFERENCE ON COMPUTER VISION, APPLICATION, AND DESIGN (CVAD 2021), 2021, 12155
  • [35] Syntax Role for Neural Semantic Role Labeling
    Li, Zuchao
    Zhao, Hai
    He, Shexia
    Cai, Jiaxun
    COMPUTATIONAL LINGUISTICS, 2021, 47 (03) : 529 - 574
  • [36] BelSmile: a biomedical semantic role labeling approach for extracting biological expression language from text
    Lai, Po-Ting
    Lo, Yu-Yan
    Huang, Ming-Siang
    Hsiao, Yu-Cheng
    Tsai, Richard Tzong-Han
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [37] Enhancing Opinion Role Labeling with Semantic-Aware Word Representations from Semantic Role Labeling
    Zhangi, Meishan
    Liang, Peili
    Fu, Guohong
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 641 - 646
  • [38] Text Categorization by Fuzzy Domain Adaptation
    Behbood, Vahid
    Lu, Jie
    Zhang, Guangquan
    2013 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ - IEEE 2013), 2013,
  • [39] Distant Domain Adaptation for Text Classification
    Zhu, Zhenlong
    Li, Yuhua
    Li, Ruixuan
    Gu, Xiwu
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2018), PT I, 2018, 11061 : 55 - 66
  • [40] Joint Feature and Labeling Function Adaptation for Unsupervised Domain Adaptation
    Cui, Fengli
    Chen, Yinghao
    Du, Yuntao
    Cao, Yikang
    Wang, Chongjun
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I, 2022, 13280 : 432 - 446