An effective up-sampling approach for breast cancer prediction with imbalanced data: A machine learning model-based comparative analysis

被引:10
|
作者
Tran, Tuan [1 ]
Le, Uyen [1 ]
Shi, Yihui [2 ]
机构
[1] Calif Northstate Univ, Coll Pharm, Elk Grove, CA 95757 USA
[2] Calif Northstate Univ, Coll Med, Elk Grove, CA USA
来源
PLOS ONE | 2022年 / 17卷 / 05期
关键词
LUNG;
D O I
10.1371/journal.pone.0269135
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Early detection of breast cancer plays a critical role in successful treatment that saves thousands of lives of patients every year. Despite massive clinical data have been collected and stored by healthcare organizations, only a small portion of the data has been used to support decision-making for treatments. In this study, we proposed an engineered up-sampling method (ENUS) for handling imbalanced data to improve predictive performance of machine learning models. Our experiment results showed that when the ratio of the minority to the majority class is less than 20%, training models with ENUS improved the balanced accuracy 3.74%, sensitivity 8.36% and F1 score 3.83%. Our study also identified that XGBoost Tree (XGBTree) using ENUS achieved the best performance with an average balanced accuracy of 97.47% (min = 93%, max = 100%), sensitivity of 97.88% (min = 89% and max = 100%), and F1 score of 96.20% (min = 89.5%, max = 100%) in the validation dataset. Furthermore, our ensemble algorithm identified Cell_Shape and Nuclei as the most important attributes in predicting breast cancer. The finding re-affirms the previous knowledge of the relationship between Cell_Shape, Nuclei, and the grades of breast cancer using a data-driven approach. Finally, our experiment showed that Random Forest and Neural Network models had the least training time. Our study provided a comprehensive comparison of a wide range of machine learning methods in predicting breast cancer risk. It can be used as a tool for healthcare practitioners to effectively detect and treat breast cancer.
引用
收藏
页数:30
相关论文
共 50 条
  • [31] A comparative study of handling imbalanced data using generative adversarial networks for machine learning based software fault prediction
    Phuong, Ha Thi Minh
    Nguyet, Pham Vu Thu
    Minh, Nguyen Huu Nhat
    Hanh, Le Thi My
    Binh, Nguyen Thanh
    APPLIED INTELLIGENCE, 2025, 55 (04)
  • [33] Ensemble Machine Learning for Enhanced Breast Cancer Prediction: A Comparative Study
    Rahman, Mijanur
    Kobir, Khandoker Humayoun
    Akther, Sanjana
    Kallol, Abul Hasnat
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (07) : 932 - 941
  • [34] Breast Cancer Prediction: A Comparative Study Using Machine Learning Techniques
    Islam M.M.
    Haque M.R.
    Iqbal H.
    Hasan M.M.
    Hasan M.
    Kabir M.N.
    SN Computer Science, 2020, 1 (5)
  • [35] Comparative Study of Machine Learning Algorithms in Breast Cancer Prognosis and Prediction
    Ithawar, Majid
    Aslam, Naeem
    Mahboob, Rao Muhammad Mahtab
    Mirza, Mueed Ahmed
    Jahangir, Hassan
    Mughal, Muhammad Awais
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2020, 20 (08): : 125 - +
  • [36] Data Requirements for Model-Based Cancer Prognosis Prediction
    Dalton, Lori A.
    Yousefi, Mohammadmahdi R.
    CANCER INFORMATICS, 2015, 14 : 123 - 138
  • [37] A novel machine learning prediction model for metastasis in breast cancer
    Li, Huan
    Liu, Ren-Bin
    Long, Chen-meng
    Teng, Yuan
    Liu, Yu
    CANCER REPORTS, 2024, 7 (03)
  • [38] A hybrid machine learning model for timely prediction of breast cancer
    Dalal, Surjeet
    Onyema, Edeh Michael
    Kumar, Pawan
    Maryann, Didiugwu Chizoba
    Roselyn, Akindutire Opeyemi
    Obichili, Mercy Ifeyinwa
    INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING, 2023, 14 (04)
  • [39] Investigation of Imbalanced Sentiment Analysis in Voice Data: A Comparative Study of Machine Learning Algorithms
    Shah, Viraj Nishchal
    Shah, Deep Rahul
    Shetty, Mayank Umesh
    Krishnan, Deepa
    Ravi, Vinnayakumar
    Singh, Swapnil
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (06): : 1 - 12
  • [40] A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset
    Liu, Tianyu
    Fan, Wenhui
    Wu, Cheng
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2019, 101