Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

被引:8
|
作者
Zeng, Kun [1 ]
Xu, Yibin [1 ]
Lin, Ge [2 ]
Liang, Likeng [3 ]
Hao, Tianyong [3 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China
[2] Sun Yat Sen Univ, Natl Engn Res Ctr Digital Life, Guangzhou, Peoples R China
[3] South China Normal Univ, Sch Comp Sci, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Eligibility criteria classification; Metric learning; Focal loss; Ensemble learning; Clinical trial; INFORMATION;
D O I
10.1186/s12911-021-01492-z
中图分类号
R-058 [];
学科分类号
摘要
Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Classifying Eligibility Criteria in Clinical Trials Using Active Deep Learning
    Chuan, Ching-Hua
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 305 - 310
  • [32] Ensemble-Based Deep Metric Learning for Few-Shot Learning
    Zhou, Meng
    Li, Yaoyi
    Lu, Hongtao
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 406 - 418
  • [33] Automated Classification for Pathological Prostate Images using AdaBoost-based Ensemble Learning
    Huang, Chao-Hui
    Kalaw, Emarene Mationg
    PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
  • [34] Metric learning for text documents
    Lebanon, G
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (04) : 497 - 508
  • [35] Active Learning Based on Transfer Learning Techniques for Text Classification
    Onita, Daniela
    IEEE ACCESS, 2023, 11 : 28751 - 28761
  • [36] METRIC BASED GAUSSIAN KERNEL LEARNING FOR CLASSIFICATION
    Guo, Zhenyu
    Wang, Z. Jane
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3582 - 3586
  • [37] Artificial intelligence based Chinese clinical trials eligibility criteria classification
    Zong H.
    Zhang Z.
    Yang J.
    Lei J.
    Li Z.
    Hao T.
    Zhang X.
    Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2021, 38 (01): : 105 - 110
  • [38] Knowledge Guided Metric Learning for Few-Shot Text Classification
    Sui, Dianbo
    Chen, Yubo
    Mao, Binjie
    Qiu, Delai
    Liu, Kang
    Zhao, Jun
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3266 - 3271
  • [39] A Proposal of Extended Cosine Measure for Distance Metric Learning in Text Classification
    Mikawa, Kenta
    Ishida, Takashi
    Goto, Masayuki
    2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 1741 - 1746
  • [40] Chinese question classification based on ensemble learning
    Jia, Keliang
    Chen, Kang
    Fan, Xiaozhong
    Zhang, Yu
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 342 - +