Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

被引:8
|
作者
Zeng, Kun [1 ]
Xu, Yibin [1 ]
Lin, Ge [2 ]
Liang, Likeng [3 ]
Hao, Tianyong [3 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China
[2] Sun Yat Sen Univ, Natl Engn Res Ctr Digital Life, Guangzhou, Peoples R China
[3] South China Normal Univ, Sch Comp Sci, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Eligibility criteria classification; Metric learning; Focal loss; Ensemble learning; Clinical trial; INFORMATION;
D O I
10.1186/s12911-021-01492-z
中图分类号
R-058 [];
学科分类号
摘要
Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Ensemble Learning Based Classification for BCI Applications
    Silva, Vitor F.
    Barbosa, Roberto M.
    Vieira, Pedro M.
    Lima, Carlos S.
    2017 IEEE 5TH PORTUGUESE MEETING ON BIOENGINEERING (ENBENG), 2017,
  • [42] Chinese text deception detection based on ensemble learning
    Zhang, Hu
    Tan, Hongye
    Qian, Yuhua
    Li, Ru
    Chen, Qian
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (05): : 1005 - 1013
  • [43] Towards Phenotyping of Clinical Trial Eligibility Criteria
    Loebe, Matthias
    Staeubert, Sebastian
    Goldberg, Colleen
    Haffner, Ivonne
    Winter, Alfred
    HEALTH INFORMATICS MEETS EHEALTH: BIOMEDICAL MEETS EHEALTH - FROM SENSORS TO DECISIONS, 2018, 248 : 293 - 299
  • [44] EnML: Multi-label Ensemble Learning for Urdu Text Classification
    Mehmood, Faiza
    Shahzadi, Rehab
    Ghafoor, Hina
    Asim, Muhammad Nabeel
    Ghani, Muhammad Usman
    Mahmood, Waqar
    Dengel, Andreas
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
  • [45] A knowledge base of clinical trial eligibility criteria
    Liu, Hao
    Chi, Yuan
    Butler, Alex
    Sun, Yingcheng
    Weng, Chunhua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 117
  • [46] Effective learning model of user classification based on ensemble learning algorithms
    Ruan, Qunsheng
    Wu, Qingfeng
    Wang, Yingdong
    Liu, Xiling
    Miao, Fengyu
    COMPUTING, 2019, 101 (06) : 531 - 545
  • [47] Extraction and Prevalence of Structured Data Elements in Free-Text Clinical Trial Eligibility Criteria
    Gulden, Christian
    Landerer, Inge
    Nassirian, Azadeh
    Altun, Fatma Betuel
    Andrae, Johanna
    ICT FOR HEALTH SCIENCE RESEARCH, 2019, 258 : 226 - 230
  • [48] A Classification Method Based on Ensemble Learning of Deep Learning and Multidimensional Scaling
    Miyazawa, Kazuya
    Sato-Ilic, Mika
    INTELLIGENT DECISION TECHNOLOGIES, KES-IDT 2021, 2021, 238 : 379 - 390
  • [49] Effective learning model of user classification based on ensemble learning algorithms
    Qunsheng Ruan
    Qingfeng Wu
    Yingdong Wang
    Xiling Liu
    Fengyu Miao
    Computing, 2019, 101 : 531 - 545
  • [50] Automated Breast Mass Classification System Using Deep Learning and Ensemble Learning in Digital Mammogram
    Malebary, Sharaf J.
    Hashmi, Arshad
    IEEE ACCESS, 2021, 9 : 55312 - 55328