Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

被引:8
|
作者
Zeng, Kun [1 ]
Xu, Yibin [1 ]
Lin, Ge [2 ]
Liang, Likeng [3 ]
Hao, Tianyong [3 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China
[2] Sun Yat Sen Univ, Natl Engn Res Ctr Digital Life, Guangzhou, Peoples R China
[3] South China Normal Univ, Sch Comp Sci, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Eligibility criteria classification; Metric learning; Focal loss; Ensemble learning; Clinical trial; INFORMATION;
D O I
10.1186/s12911-021-01492-z
中图分类号
R-058 [];
学科分类号
摘要
Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Rough set and ensemble learning based semi-supervised algorithm for text classification
    Shi, Lei
    Ma, Xinming
    Xi, Lei
    Duan, Qiguo
    Zhao, Jingying
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 6300 - 6306
  • [22] TextCNN-based ensemble learning model for Japanese Text Multi-classification
    Chen, Hua
    Zhang, Zepeng
    Huang, Shiting
    Hu, Jiayu
    Ni, Wenlong
    Liu, Jianming
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 109
  • [23] Automatic Modulation Classification Based on Decentralized Learning and Ensemble Learning
    Fu, Xue
    Gui, Guan
    Wang, Yu
    Gacanin, Haris
    Adachi, Fumiyuki
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71 (07) : 7942 - 7946
  • [24] Learning regular expressions for clinical text classification
    Duy Duc An Bui
    Zeng-Treitler, Qing
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (05) : 850 - 857
  • [25] An Ensemble Rule Learning Approach for Automated Morphological Classification of Erythrocytes
    Maitreya Maity
    Tushar Mungle
    Dhiraj Dhane
    A. K. Maiti
    Chandan Chakraborty
    Journal of Medical Systems, 2017, 41
  • [26] An Ensemble Rule Learning Approach for Automated Morphological Classification of Erythrocytes
    Maity, Maitreya
    Mungle, Tushar
    Dhane, Dhiraj
    Maiti, A. K.
    Chakraborty, Chandan
    JOURNAL OF MEDICAL SYSTEMS, 2017, 41 (04)
  • [27] Attention-Based Ensemble for Deep Metric Learning
    Kim, Wonsik
    Goyal, Bhavya
    Chawla, Kunal
    Lee, Jungmin
    Kwon, Keunjoo
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 760 - 777
  • [28] A Novel Tracking Method Based on Ensemble Metric Learning
    Huo, Qirun
    Lu, Yao
    2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 176 - 179
  • [29] LMNNB: Two-in-One imbalanced classification approach by combining metric learning and ensemble learning
    Qiao, Shaojie
    Han, Nan
    Huang, Faliang
    Yue, Kun
    Wu, Tao
    Yi, Yugen
    Mao, Rui
    Yuan, Chang-an
    APPLIED INTELLIGENCE, 2022, 52 (07) : 7870 - 7889
  • [30] LMNNB: Two-in-One imbalanced classification approach by combining metric learning and ensemble learning
    Shaojie Qiao
    Nan Han
    Faliang Huang
    Kun Yue
    Tao Wu
    Yugen Yi
    Rui Mao
    Chang-an Yuan
    Applied Intelligence, 2022, 52 : 7870 - 7889