Predicting Nurse Turnover for Highly Imbalanced Data Using the Synthetic Minority Over-Sampling Technique and Machine Learning Algorithms

被引:3
|
作者
Xu, Yuan [1 ]
Park, Yongshin [2 ]
Park, Ju Dong [3 ]
Sun, Bora [4 ]
机构
[1] Dalian Maritime Univ, Collaborat Innovat Ctr Transport Studies, Sch Maritime Econ & Management, 1 Linghai Rd, Dalian 116026, Peoples R China
[2] St Edwards Univ, Bill Munday Sch Business, Dept Mkt Operat & Analyt, 3001 South Congress, Austin, TX 78704 USA
[3] Gyeongsang Natl Univ, Dept Maritime Police & Prod Syst, Tongyeong Si 53064, Gyeongsangnam D, South Korea
[4] Univ Texas Austin, Sch Nursing, 1710 Red River St, Austin, TX 78712 USA
关键词
nurse turnover; machine learning; SMOTE; NSSRN; random forest; XGBoost; ASSOCIATION; BURNOUT; SMOTE;
D O I
10.3390/healthcare11243173
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Predicting nurse turnover is a growing challenge within the healthcare sector, profoundly impacting healthcare quality and the nursing profession. This study employs the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance issues in the 2018 National Sample Survey of Registered Nurses dataset and predict nurse turnover using machine learning algorithms. Four machine learning algorithms, namely logistic regression, random forests, decision tree, and extreme gradient boosting, were applied to the SMOTE-enhanced dataset. The data were split into 80% training and 20% validation sets. Eighteen carefully selected variables from the database served as predictive features, and the machine learning model identified age, working hours, electric health record/electronic medical record, individual income, and job type as important features concerning nurse turnover. The study includes a performance comparison based on accuracy, precision, recall (sensitivity), F1-score, and AUC. In summary, the results demonstrate that SMOTE-enhanced random forests exhibit the most robust predictive power in the classical approach (with all 18 predictive variables) and an optimized approach (utilizing eight key predictive variables). Extreme gradient boosting, decision tree, and logistic regression follow in performance. Notably, age emerges as the most influential factor in nurse turnover, with working hours, electric health record/electronic medical record usability, individual income, and region also playing significant roles. This research offers valuable insights for healthcare researchers and stakeholders, aiding in selecting suitable machine learning algorithms for nurse turnover prediction.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Imbalanced data classification using improved synthetic minority over-sampling technique
    Anusha, Yamijala
    Visalakshi, R.
    Srinivas, Konda
    [J]. MULTIAGENT AND GRID SYSTEMS, 2023, 19 (02) : 117 - 131
  • [2] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    [J]. PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [3] Predicting Seminal Quality via Imbalanced Learning with Evolutionary Safe-Level Synthetic Minority Over-Sampling Technique
    Jieming Ma
    David Olalekan Afolabi
    Jie Ren
    Aiyan Zhen
    [J]. Cognitive Computation, 2021, 13 : 833 - 844
  • [4] Predicting Seminal Quality via Imbalanced Learning with Evolutionary Safe-Level Synthetic Minority Over-Sampling Technique
    Ma, Jieming
    Afolabi, David Olalekan
    Ren, Jie
    Zhen, Aiyan
    [J]. COGNITIVE COMPUTATION, 2021, 13 (04) : 833 - 844
  • [5] Classification of imbalanced PubChem BioAssay data using an efficient algorithm coupled with synthetic minority over-sampling technique
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 247
  • [6] An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    [J]. ANALYTICA CHIMICA ACTA, 2014, 806 : 117 - 127
  • [7] Synthetic Minority Over-Sampling Technique based on Fuzzy C-means Clustering for Imbalanced Data
    Lee, Hansoo
    Jung, Seunghyan
    Kim, Minseok
    Kimt, Sungshin
    [J]. 2017 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2017,
  • [8] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16):
  • [9] Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique
    Eom, Gayeong
    Byeon, Haewon
    [J]. MATHEMATICS, 2023, 11 (16)
  • [10] Dynamic Synthetic Minority Over-Sampling Technique-Based Rotation Forest for the Classification of Imbalanced Hyperspectral Data
    Feng, Wei
    Dauphin, Gabriel
    Huang, Wenjiang
    Quan, Yinghui
    Bao, Wenxing
    Wu, Mingquan
    Li, Qiang
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (07) : 2159 - 2169