An effective up-sampling approach for breast cancer prediction with imbalanced data: A machine learning model-based comparative analysis

被引:10
|
作者
Tran, Tuan [1 ]
Le, Uyen [1 ]
Shi, Yihui [2 ]
机构
[1] Calif Northstate Univ, Coll Pharm, Elk Grove, CA 95757 USA
[2] Calif Northstate Univ, Coll Med, Elk Grove, CA USA
来源
PLOS ONE | 2022年 / 17卷 / 05期
关键词
LUNG;
D O I
10.1371/journal.pone.0269135
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Early detection of breast cancer plays a critical role in successful treatment that saves thousands of lives of patients every year. Despite massive clinical data have been collected and stored by healthcare organizations, only a small portion of the data has been used to support decision-making for treatments. In this study, we proposed an engineered up-sampling method (ENUS) for handling imbalanced data to improve predictive performance of machine learning models. Our experiment results showed that when the ratio of the minority to the majority class is less than 20%, training models with ENUS improved the balanced accuracy 3.74%, sensitivity 8.36% and F1 score 3.83%. Our study also identified that XGBoost Tree (XGBTree) using ENUS achieved the best performance with an average balanced accuracy of 97.47% (min = 93%, max = 100%), sensitivity of 97.88% (min = 89% and max = 100%), and F1 score of 96.20% (min = 89.5%, max = 100%) in the validation dataset. Furthermore, our ensemble algorithm identified Cell_Shape and Nuclei as the most important attributes in predicting breast cancer. The finding re-affirms the previous knowledge of the relationship between Cell_Shape, Nuclei, and the grades of breast cancer using a data-driven approach. Finally, our experiment showed that Random Forest and Neural Network models had the least training time. Our study provided a comprehensive comparison of a wide range of machine learning methods in predicting breast cancer risk. It can be used as a tool for healthcare practitioners to effectively detect and treat breast cancer.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Machine Learning Based Comparative Analysis for Breast Cancer Prediction
    Monirujjaman Khan, Mohammad
    Islam, Somayea
    Sarkar, Srobani
    Ayaz, Foyazel Iben
    Ananda, Morsaleen Kabeer
    Tazin, Tahia
    Albraikan, Amani Abdulrahman
    Almalki, Faris A.
    Journal of Healthcare Engineering, 2022, 2022
  • [2] Machine Learning Based Comparative Analysis for Breast Cancer Prediction
    Monirujjaman Khan, Mohammad
    Islam, Somayea
    Sarkar, Srobani
    Ayaz, Foyazel Iben
    Ananda, Morsaleen Kabeer
    Tazin, Tahia
    Albraikan, Amani Abdulrahman
    Almalki, Faris A.
    JOURNAL OF HEALTHCARE ENGINEERING, 2022, 2022
  • [3] Model-Based Synthetic Sampling for Imbalanced Data
    Liu, Chien-Liang
    Hsieh, Po-Yen
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (08) : 1543 - 1556
  • [4] Cervical Cancer Prediction Based on Imbalanced Data Using Machine Learning Algorithms with a Variety of Sampling Methods
    Muraru, Madalina Maria
    Simo, Zsuzsa
    Iantovics, Laszlo Barna
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [5] A comparative analysis of machine learning techniques for imbalanced data
    Mrad, Ali Ben
    Lahiani, Amine
    Mefteh-Wali, Salma
    Mselmi, Nada
    ANNALS OF OPERATIONS RESEARCH, 2024,
  • [6] RETRACTED: Machine Learning Based Comparative Analysis for Breast Cancer Prediction (Retracted Article)
    Khan, Mohammad Monirujjaman
    Islam, Somayea
    Sarkar, Srobani
    Ayaz, Foyazel Iben
    Ananda, Morsaleen Kabeer
    Tazin, Tahia
    Albraikan, Amani Abdulrahman
    Almalki, Faris A.
    JOURNAL OF HEALTHCARE ENGINEERING, 2022, 2022
  • [7] Prediction of Breast Cancer, Comparative Review of Machine Learning Techniques, and Their Analysis
    Fatima, Noreen
    Liu, Li
    Hong, Sha
    Ahmed, Haroon
    IEEE ACCESS, 2020, 8 : 150360 - 150376
  • [8] Prediction Model of Breast Cancer Survival Months: A Machine Learning Approach
    Naser, Mohammad Y. M.
    Chambers, Destini
    Bhattacharya, Sylvia
    SOUTHEASTCON 2023, 2023, : 851 - 855
  • [9] Imbalanced generative sampling of training data for improving quality of machine learning model
    Coskun, Umut Can
    Dogan, Kemal Mert
    Gunpinar, Erkan
    ADVANCED ENGINEERING INFORMATICS, 2024, 62
  • [10] Comparative Analysis of Breast and Prostate Cancer Prediction Using Machine Learning Techniques
    Rani, Samta
    Ahmad, Tanvir
    Masood, Sarfaraz
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 1, 2023, 473 : 643 - 650