Predicting the time to get back to work using statistical models and machine learning approaches

被引:0
|
作者
Bouliotis, George [1 ]
Underwood, M. [1 ]
Froud, R. [2 ]
机构
[1] Univ Warwick, Warwick Clin Trials Unit, Coventry, England
[2] Hoyskolen Kristiania, Oslo, Norway
关键词
Machine Learning; Survival analysis; Statistical methods; Return to work; Socioeconomic deprivation; EXTERNAL VALIDATION; REGULARIZATION; IMPUTATION;
D O I
10.1186/s12874-024-02390-4
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
BackgroundWhether machine learning approaches are superior to classical statistical models for survival analyses, especially in the case of lack of proportionality, is unknown.ObjectivesTo compare model performance and predictive accuracy of classic regressions and machine learning approaches using data from the Inspiring Families programme.MethodsThe Inspiring Families programme aims to support members of families with complex issues to return to work. We explored predictors of time to return to work with proportional hazards (Semi-Parametric Cox in Stata) and (Flexible Parametric Parmar-Royston in Stata) against the Survival penalised regression with Elastic Net penalty (scikit-survival), (conditional) Survival Forest algorithm (pySurvival), and (kernel) Survival Support Vector Machine (pySurvival).ResultsAt baseline we obtained data on 61 binary variables from all 3161 participants. No model appeared superior, with a low predictive power (concordance index between 0.51 and 0.61). The median time for finding the first job was about 254 days. The top five contributing variables were 'family issues and additional barriers', 'restriction of hours', 'available CV', 'self-employment considered' and 'education'. The Harrell's Concordance index was range from 0.60 (Cox model) to 0.71 (Random Survival Forest) suggesting a better fit for the machine learning approaches. However, the comparison for predicting median time on a selected scenario based showed only minor differences.ConclusionImplementing a series of survival models with and without proportional hazards background provides a useful insight as well as better interpretation of the coefficients affected by non-linearities. However, that better fit does not translate to substantially higher predictive power and accuracy from using machine learning approaches. Further tuning of the machine learning algorithms may provide improved results.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] PREDICTING HEALTHCARE COSTS OF DIABETES USING MACHINE LEARNING MODELS
    Gonzalez Rodriguez, J.
    Pinzon Espitia, O. L.
    Franco, C.
    Augusto, V
    VALUE IN HEALTH, 2019, 22 : S575 - S575
  • [33] Predicting maternal risk level using machine learning models
    Al Mashrafi, Sulaiman Salim
    Tafakori, Laleh
    Abdollahian, Mali
    BMC PREGNANCY AND CHILDBIRTH, 2024, 24 (01)
  • [34] Comparison of Predicting Regional Mortalities Using Machine Learning Models
    Caglar, Oguzhan
    Ozen, Figen
    ARTIFICIAL INTELLIGENCE FOR INTERNET OF THINGS (IOT) AND HEALTH SYSTEMS OPERABILITY, IOTHIC 2023, 2024, 8 : 59 - 72
  • [35] Predicting brain tumor presence using machine learning models
    Huang, Weiguo
    Dai, Zhenhua
    MULTISCALE AND MULTIDISCIPLINARY MODELING EXPERIMENTS AND DESIGN, 2025, 8 (01)
  • [36] Predicting Web Survey Breakoffs Using Machine Learning Models
    Chen, Zeming
    Cernat, Alexandru
    Shlomo, Natalie
    SOCIAL SCIENCE COMPUTER REVIEW, 2023, 41 (02) : 573 - 591
  • [37] Predicting Promoters in Phage Genomes Using Machine Learning Models
    Sampaio, Marta
    Rocha, Miguel
    Oliveira, Hugo
    Dias, Oscar
    PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 1005 : 105 - 112
  • [38] Assessing predictability of environmental time series with statistical and machine learning models
    Bonas, Matthew
    Datta, Abhirup
    Wikle, Christopher K.
    Boone, Edward L.
    Alamri, Faten S.
    Hari, Bhava Vyasa
    Kavila, Indulekha
    Simmons, Susan J.
    Jarvis, Shannon M.
    Burr, Wesley S.
    Pagendam, Daniel E.
    Chang, Won
    Castruccio, Stefano
    ENVIRONMETRICS, 2025, 36 (01)
  • [39] Assessing Predictability of Environmental Time Series With Statistical and Machine Learning Models
    Newlands, Nathaniel K.
    Lyubchich, Vyacheslav
    ENVIRONMETRICS, 2025, 36 (02)
  • [40] Predicting childhood asthma using machine learning and data integration approaches
    Kothalawala, Dilini
    Murray, Clare
    Simpson, Angela
    Custovic, Adnan
    Tapper, William
    Arshad, Hasan
    Holloway, John
    Rezwan, Faisal
    CLINICAL AND EXPERIMENTAL ALLERGY, 2021, 51 (12): : 1683 - 1683