An effective up-sampling approach for breast cancer prediction with imbalanced data: A machine learning model-based comparative analysis

被引:10
|
作者
Tran, Tuan [1 ]
Le, Uyen [1 ]
Shi, Yihui [2 ]
机构
[1] Calif Northstate Univ, Coll Pharm, Elk Grove, CA 95757 USA
[2] Calif Northstate Univ, Coll Med, Elk Grove, CA USA
来源
PLOS ONE | 2022年 / 17卷 / 05期
关键词
LUNG;
D O I
10.1371/journal.pone.0269135
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Early detection of breast cancer plays a critical role in successful treatment that saves thousands of lives of patients every year. Despite massive clinical data have been collected and stored by healthcare organizations, only a small portion of the data has been used to support decision-making for treatments. In this study, we proposed an engineered up-sampling method (ENUS) for handling imbalanced data to improve predictive performance of machine learning models. Our experiment results showed that when the ratio of the minority to the majority class is less than 20%, training models with ENUS improved the balanced accuracy 3.74%, sensitivity 8.36% and F1 score 3.83%. Our study also identified that XGBoost Tree (XGBTree) using ENUS achieved the best performance with an average balanced accuracy of 97.47% (min = 93%, max = 100%), sensitivity of 97.88% (min = 89% and max = 100%), and F1 score of 96.20% (min = 89.5%, max = 100%) in the validation dataset. Furthermore, our ensemble algorithm identified Cell_Shape and Nuclei as the most important attributes in predicting breast cancer. The finding re-affirms the previous knowledge of the relationship between Cell_Shape, Nuclei, and the grades of breast cancer using a data-driven approach. Finally, our experiment showed that Random Forest and Neural Network models had the least training time. Our study provided a comprehensive comparison of a wide range of machine learning methods in predicting breast cancer risk. It can be used as a tool for healthcare practitioners to effectively detect and treat breast cancer.
引用
收藏
页数:30
相关论文
共 50 条
  • [41] Customer purchase prediction from the perspective of imbalanced data: A machine learning framework based on factorization machine
    Chen, Shui-xia
    Wang, Xiao-kang
    Zhang, Hong-yu
    Wang, Jian-qiang
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 173
  • [42] Predicting Vasovagal Responses: A Model-Based and Machine Learning Approach
    Raphan, Theodore
    Yakushin, Sergei B.
    FRONTIERS IN NEUROLOGY, 2021, 12
  • [43] Towards Model-based Pricing for Machine Learning in a Data Marketplace
    Chen, Lingjiao
    Koutris, Paraschos
    Kumar, Arun
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1535 - 1552
  • [44] Performance of machine learning algorithms for lung cancer prediction: a comparative approach
    Maurya, Satya Prakash
    Sisodia, Pushpendra Singh
    Mishra, Rahul
    Singh, Devesh Pratap
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [45] Survey of cervical cancer Prediction using Machine Learning: A comparative approach
    Shetty, Akshitha
    Shah, Vrushika
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [46] Comparative Analysis of Machine Learning Algorithms in Breast Cancer Classification
    Chaurasiya, Satish
    Rajak, Ranjit
    WIRELESS PERSONAL COMMUNICATIONS, 2023, 131 (02) : 763 - 772
  • [47] Comparative Analysis of Machine Learning Algorithms in Breast Cancer Classification
    Satish Chaurasiya
    Ranjit Rajak
    Wireless Personal Communications, 2023, 131 : 763 - 772
  • [48] Predicting Breast Cancer Survival Rate Based on Genetic Data: A Machine Learning Approach
    Yadav, Saanya
    Hasija, Yasha
    ADVANCES IN DIGITAL HEALTH AND MEDICAL BIOENGINEERING, VOL 1, EHB-2023, 2024, 109 : 393 - 399
  • [49] Breast Cancer Prediction Based on Multiple Machine Learning Algorithms
    Zhou, Sheng
    Hu, Chujiao
    Wei, Shanshan
    Yan, Xiaofan
    TECHNOLOGY IN CANCER RESEARCH & TREATMENT, 2024, 23
  • [50] Machine Learning System for the Effective Diagnosis and Survival Prediction of Breast Cancer Patients
    Gago, Arturo
    Aguirre, Jean Marko
    Wong, Lenis
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2024, 20 (02) : 95 - 113