Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

被引:8
|
作者
Zeng, Kun [1 ]
Xu, Yibin [1 ]
Lin, Ge [2 ]
Liang, Likeng [3 ]
Hao, Tianyong [3 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China
[2] Sun Yat Sen Univ, Natl Engn Res Ctr Digital Life, Guangzhou, Peoples R China
[3] South China Normal Univ, Sch Comp Sci, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Eligibility criteria classification; Metric learning; Focal loss; Ensemble learning; Clinical trial; INFORMATION;
D O I
10.1186/s12911-021-01492-z
中图分类号
R-058 [];
学科分类号
摘要
Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
    Kun Zeng
    Yibin Xu
    Ge Lin
    Likeng Liang
    Tianyong Hao
    BMC Medical Informatics and Decision Making, 21
  • [2] An Ensemble Learning Strategy for Eligibility Criteria Text Classification for Clinical Trial Recruitment: Algorithm Development and Validation
    Zeng, Kun
    Pan, Zhiwei
    Xu, Yibin
    Qu, Yingying
    JMIR MEDICAL INFORMATICS, 2020, 8 (07)
  • [3] Structural analysis and intelligent classification of clinical trial eligibility criteria based on deep learning and medical text mining
    Han, Yongzhong
    Su, Qianmin
    Liu, Liang
    Li, Ying
    Huang, Jihan
    Journal of Biomedical Informatics, 2024, 160
  • [4] Towards Ensemble-Based Imbalanced Text Classification Using Metric Learning
    Komamizu, Takahiro
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2023, PT II, 2023, 14147 : 188 - 202
  • [5] Tibetan Text Classification based on Prompt Learning and Ensemble Learning
    Tang, Chao
    Tan, Zelin
    Zhao, Xiaobing
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2025, 24 (02)
  • [6] DISTRIBUTED ENSEMBLE LEARNING IN TEXT CLASSIFICATION
    Silva, Catarina
    Ribeiro, Bernardete
    Lotric, Uros
    Dobnikar, Andrej
    ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL AIDSS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2008, : 420 - +
  • [7] Ensemble Learning Based Feature Selection with an Application to Text Classification
    Onan, Aytug
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [8] An Ensemble Based Machine Learning Classification for Automated Glaucoma Detection
    Pawar, Digvijay J.
    Kanse, Yuvraj K.
    Patil, Suhas S.
    ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2024, 13
  • [9] Distributed Text Classification With an Ensemble Kernel-Based Learning Approach
    Silva, Catarina
    Lotric, Uros
    Ribeiro, Bernardete
    Dobnikar, Andrej
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2010, 40 (03): : 287 - 297
  • [10] Improving the Efficiency of Clinical Trial Recruitment Using an Ensemble Machine Learning to Assist With Eligibility Screening
    Cai, Tianrun
    Cai, Fiona
    Dahal, Kumar P.
    Cremone, Gabrielle
    Lam, Ethan
    Golnik, Charlotte
    Seyok, Thany
    Hong, Chuan
    Cai, Tianxi
    Liao, Katherine P.
    ACR OPEN RHEUMATOLOGY, 2021, 3 (09) : 593 - 600