Predicting factors for survival of breast cancer patients using machine learning techniques

被引:129
|
作者
Ganggayah, Mogana Darshini [1 ]
Taib, Nur Aishah [2 ]
Har, Yip Cheng [2 ]
Lio, Pietro [3 ]
Dhillon, Sarinder Kaur [1 ]
机构
[1] Univ Malaya, Inst Biol Sci, Fac Sci, Data Sci & Bioinformat Lab, Kuala Lumpur 50603, Malaysia
[2] Univ Malaya, Dept Surg, Fac Med, Kuala Lumpur 50603, Malaysia
[3] Univ Cambridge, Dept Comp Sci & Technol, 15 JJ Thomson Ave, Cambridge CB3 0FD, England
关键词
Data science; Machine learning; Factors influencing survival of breast cancer; Random forest; Decision tree; RANDOM FOREST; LOGISTIC-REGRESSION; NODE DISSECTION; TUMOR SIZE; TREE; CLASSIFICATION; DISEASE; MODEL; NUMBER; TRENDS;
D O I
10.1186/s12911-019-0801-4
中图分类号
R-058 [];
学科分类号
摘要
BackgroundBreast cancer is one of the most common diseases in women worldwide. Many studies have been conducted to predict the survival indicators, however most of these analyses were predominantly performed using basic statistical methods. As an alternative, this study used machine learning techniques to build models for detecting and visualising significant prognostic indicators of breast cancer survival rate.MethodsA large hospital-based breast cancer dataset retrieved from the University Malaya Medical Centre, Kuala Lumpur, Malaysia (n=8066) with diagnosis information between 1993 and 2016 was used in this study. The dataset contained 23 predictor variables and one dependent variable, which referred to the survival status of the patients (alive or dead). In determining the significant prognostic factors of breast cancer survival rate, prediction models were built using decision tree, random forest, neural networks, extreme boost, logistic regression, and support vector machine. Next, the dataset was clustered based on the receptor status of breast cancer patients identified via immunohistochemistry to perform advanced modelling using random forest. Subsequently, the important variables were ranked via variable selection methods in random forest. Finally, decision trees were built and validation was performed using survival analysis.ResultsIn terms of both model accuracy and calibration measure, all algorithms produced close outcomes, with the lowest obtained from decision tree (accuracy=79.8%) and the highest from random forest (accuracy=82.7%). The important variables identified in this study were cancer stage classification, tumour size, number of total axillary lymph nodes removed, number of positive lymph nodes, types of primary treatment, and methods of diagnosis.ConclusionInterestingly the various machine learning algorithms used in this study yielded close accuracy hence these methods could be used as alternative predictive tools in the breast cancer survival studies, particularly in the Asian region. The important prognostic factors influencing survival rate of breast cancer identified in this study, which were validated by survival curves, are useful and could be translated into decision support tools in the medical domain.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Predicting Lung Cancer Survival Time Using Deep Learning Techniques
    Baker, Qanita Bani
    Gharaibeh, Maram
    Al-Harahsheh, Yara
    2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 177 - 181
  • [42] Lessons learned in predicting gastric cancer survival using machine learning
    De Benedetti, Marc
    Le, Phuong
    Le, Hoa V.
    Truong, Chi T. L.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2021, 30 : 261 - 262
  • [43] Predicting Multi-Omics Parallel Integration Survival Subtypes in Lung Cancer Using Machine Learning Techniques
    Takahashi, Satoshi
    Asada, Ken
    Takasawa, Ken
    Shimoyama, Ryo
    Sakai, Akira
    Bolatkan, Amina
    Shinkai, Norio
    Kobayashi, Kazuma
    Komatsu, Masaaki
    Kaneko, Syuzo
    Sese, Jun
    Hamamoto, Ryuji
    CANCER SCIENCE, 2022, 113 : 874 - 874
  • [44] Prediction of Cervical Cancer Patients' Survival Period with Machine Learning Techniques
    Chanudom, Intorn
    Tharavichitkul, Ekkasit
    Laosiritaworn, Wimalin
    HEALTHCARE INFORMATICS RESEARCH, 2024, 30 (01) : 60 - 72
  • [45] Predicting the Outcome of Patients With Subarachnoid Hemorrhage Using Machine Learning Techniques
    de Toledo, Paula
    Rios, Pablo M.
    Ledezma, Agapito
    Sanchis, Araceli
    Alen, Jose F.
    Lagares, Alfonso
    IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2009, 13 (05): : 794 - 801
  • [46] Predicting chronic postsurgical pain in breast cancer patients: a machine learning approach using XGboost
    Lan, Ling
    Li, Mo-Han
    Tan, Gang
    Zhang, Zhi-Yong
    Zhang, Yue-Lun
    Pei, Li-Jian
    Huang, Yu-Guang
    ANESTHESIA AND ANALGESIA, 2021, 133 (3S_SUPPL): : 60 - 61
  • [47] Predicting IRI Using Machine Learning Techniques
    Sharma, Ankit
    Sachdeva, S. N.
    Aggarwal, Praveen
    INTERNATIONAL JOURNAL OF PAVEMENT RESEARCH AND TECHNOLOGY, 2023, 16 (01) : 128 - 137
  • [48] Predicting IRI Using Machine Learning Techniques
    Ankit Sharma
    S. N. Sachdeva
    Praveen Aggarwal
    International Journal of Pavement Research and Technology, 2023, 16 : 128 - 137
  • [49] Predicting Diabetes Using Machine Learning Techniques
    Kirgil, Elif Nur Haner
    Erkal, Begum
    Ayyildiz, Tulin Ercelebi
    2022 INTERNATIONAL CONFERENCE ON THEORETICAL AND APPLIED COMPUTER SCIENCE AND ENGINEERING (ICTASCE), 2022, : 137 - 141
  • [50] Predicting Breast Cancer Survival Rate Based on Genetic Data: A Machine Learning Approach
    Yadav, Saanya
    Hasija, Yasha
    ADVANCES IN DIGITAL HEALTH AND MEDICAL BIOENGINEERING, VOL 1, EHB-2023, 2024, 109 : 393 - 399