Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer

被引:5
|
作者
Yang, Ruiyuan [1 ]
Xiong, Xingyu [1 ]
Wang, Haoyu [1 ]
Li, Weimin [1 ,2 ,3 ,4 ]
机构
[1] Sichuan Univ, West China Hosp, Dept Resp & Crit Care Med, Chengdu, Peoples R China
[2] Sichuan Univ, West China Hosp, Inst Resp Hlth Frontiers Sci Ctr Dis Related Mol N, Chengdu, Peoples R China
[3] Sichuan Univ, West China Hosp, Precis Med Ctr, Precis Med Key Lab Sichuan Prov, Chengdu, Peoples R China
[4] West China Hosp, Chinses Acad Med Sci, Res Units West China, Chengdu, Peoples R China
来源
FRONTIERS IN ONCOLOGY | 2022年 / 12卷
基金
中国国家自然科学基金;
关键词
EGFR mutation; lung cancer; prediction; machine learning; SHAP value; MOLECULAR EPIDEMIOLOGY; SYSTEMIC THERAPY; ADENOCARCINOMA; RISK;
D O I
10.3389/fonc.2022.924144
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
ObjectivesThe aim of this study is to determine whether the clinical features including blood markers can establish an explainable machine learning model to predict epidermal growth factor receptor (EGFR) mutation in lung cancer. MethodsWe retrospectively analyzed 7,413 patients with lung adenocarcinoma (LA) diagnosed by gene sequencing in West China Hospital of the Sichuan University from April 2015 to June 2019. The machine learning algorithms (MLAs) included logistic regression (LR), random forest (RF), LightGBM, support vector machine (SVM), multi-layer perceptron (MLP), extreme gradient boosting (XGBoost), and decision tree (DT). Demographic characteristics, personal history, and blood markers were taken into. The area under the receiver operating characteristic curve (AUC) and SHapley Additive exPlanation (SHAP) value were used to explain the prediction models. ResultsOf the 7,413 patients with LA (47.6%), 3,527 were identified with EGFR mutation; RF achieved greatest performance in predicting EGFR mutation AUC [0.771, 95% confidence interval (CI): 0.770, 0.772], which was like XGBoost with AUC (0.740, 95% CI: 0.739, 0.741). The five most influential features were smoking consumption, sex, cholesterol, age, and albumin globulin ratio. The SHAP summary and dependence plot have been used to explain the affection of the 12 features to this model and how a single feature influences the output, respectively. ConclusionWe established EGFR mutation prediction models by MLAs and revealed that the RF was preferred, AUC (0.771, 95% CI: 0.770, 0.772), which was better than the traditional models. Therefore, the artificial intelligence-based MLA predicting model may become a practical tool to guide in diagnosis and therapy of LA.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Lung Cancer Incidence Prediction Using Machine Learning Algorithms
    Tuncal, Kubra
    Sekeroglu, Boran
    Ozkan, Cagri
    [J]. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2020, 11 (02) : 91 - 96
  • [42] Use of explainable machine learning models in blast load prediction
    Widanage, C.
    Mohotti, D.
    Lee, C. K.
    Wijesooriya, K.
    Meddage, D. P. P.
    [J]. ENGINEERING STRUCTURES, 2024, 312
  • [43] Explainable Machine Learning Prediction for the Academic Performance of Deaf Scholars
    Raji, N. R.
    Kumar, R. Mathusoothana S.
    Biji, C. L.
    [J]. IEEE ACCESS, 2024, 12 : 23595 - 23612
  • [44] Explainable prediction of loan default based on machine learning models
    Zhu, Xu
    Chu, Qingyong
    Song, Xinchang
    Hu, Ping
    Peng, Lu
    [J]. Data Science and Management, 2023, 6 (03): : 123 - 133
  • [45] Diabetes prediction using machine learning and explainable AI techniques
    Tasin, Isfafuzzaman
    Nabil, Tansin Ullah
    Islam, Sanjida
    Khan, Riasat
    [J]. HEALTHCARE TECHNOLOGY LETTERS, 2023, 10 (1-2) : 1 - 10
  • [46] iSPAN: Explainable prediction of outcomes post thrombectomy with Machine Learning
    Kelly, Brendan S.
    Mathur, Prateek
    Vaca, Silvia D.
    Duignan, John
    Power, Sarah
    Lee, Edward H.
    Huang, Yuhao
    Prolo, Laura M.
    Yeom, Kristen W.
    Lawlor, Aonghus
    Killeen, Ronan P.
    Thornton, John
    [J]. EUROPEAN JOURNAL OF RADIOLOGY, 2024, 173
  • [47] An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data
    Kokkotis, Christos
    Giarmatzis, Georgios
    Giannakou, Erasmia
    Moustakidis, Serafeim
    Tsatalas, Themistoklis
    Tsiptsios, Dimitrios
    Vadikolias, Konstantinos
    Aggelousis, Nikolaos
    [J]. DIAGNOSTICS, 2022, 12 (10)
  • [48] Explainable Machine Learning for Drug Shortage Prediction in a Pandemic Setting
    Li, Jiye
    Almentero, Bruno Kinder
    Besse, Camille
    [J]. MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2022, PT I, 2023, 13810 : 141 - 155
  • [49] Explainable machine learning for the prediction and assessment of complex drought impacts
    Zhang, Beichen
    Abu Salem, Fatima K.
    Hayes, Michael J.
    Smith, Kelly Helm
    Tadesse, Tsegaye
    Wardlow, Brian D.
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2023, 898
  • [50] Advances in Machine Learning and Explainable Artificial Intelligence for Depression Prediction
    Byeon, Haewon
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 520 - 526