Generalized Estimating Equations Boosting (GEEB) machine for correlated data

被引:0
|
作者
Yuan-Wey Wang
Hsin-Chou Yang
Yi-Hau Chen
Chao-Yu Guo
机构
[1] National Yang Ming Chiao Tung University,Division of Biostatistics and Data Science, Institute of Public Health, College of Medicine
[2] Academia Sinica,Institute of Statistical Science
来源
关键词
Correlated data; Hierarchical data; Generalized Estimating Equations; Machine learning; Gradient boosting;
D O I
暂无
中图分类号
学科分类号
摘要
Rapid development in data science enables machine learning and artificial intelligence to be the most popular research tools across various disciplines. While numerous articles have shown decent predictive ability, little research has examined the impact of complex correlated data. We aim to develop a more accurate model under repeated measures or hierarchical data structures. Therefore, this study proposes a novel algorithm, the Generalized Estimating Equations Boosting (GEEB) machine, to integrate the gradient boosting technique into the benchmark statistical approach that deals with the correlated data, the generalized Estimating Equations (GEE). Unlike the previous gradient boosting utilizing all input features, we randomly select some input features when building the model to reduce predictive errors. The simulation study evaluates the predictive performance of the GEEB, GEE, eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) across several hierarchical structures with different sample sizes. Results suggest that the new strategy GEEB outperforms the GEE and demonstrates superior predictive accuracy than the SVM and XGBoost in most situations. An application to a real-world dataset, the Forest Fire Data, also revealed that the GEEB reduced mean squared errors by 4.5% to 25% compared to GEE, XGBoost, and SVM. This research also provides a freely available R function that could implement the GEEB machine effortlessly for longitudinal or hierarchical data.
引用
收藏
相关论文
共 50 条
  • [41] ESTIMATING EQUATIONS FOR HAZARD RATIO PARAMETERS BASED ON CORRELATED FAILURE TIME DATA
    CAI, JW
    PRENTICE, RL
    BIOMETRIKA, 1995, 82 (01) : 151 - 164
  • [42] Generalized Estimating Equations for Zero-Inflated Spatial Count Data
    Monod, Anthea
    SPATIAL STATISTICS 2011: MAPPING GLOBAL CHANGE, 2011, 7 : 281 - 286
  • [43] Extended generalized estimating equations for binary familial data with incomplete families
    Fitzgerald, PEB
    BIOMETRICS, 2002, 58 (04) : 718 - 726
  • [44] ANALYSIS OF REPEATED CATEGORICAL-DATA USING GENERALIZED ESTIMATING EQUATIONS
    LIPSITZ, SR
    KIM, K
    ZHAO, LP
    STATISTICS IN MEDICINE, 1994, 13 (11) : 1149 - 1163
  • [45] Using Generalized Estimating Equations to Analyze Longitudinal Data in Nursing Research
    Liu, Shan
    Dixon, Jane
    Qiu, Guang
    Tian, Yu
    McCorkle, Ruth
    WESTERN JOURNAL OF NURSING RESEARCH, 2009, 31 (07) : 948 - 964
  • [46] Model selection of generalized estimating equations with multiply imputed longitudinal data
    Shen, Chung-Wei
    Chen, Yi-Hau
    BIOMETRICAL JOURNAL, 2013, 55 (06) : 899 - 911
  • [47] Power determination for geographically clustered data using generalized estimating equations
    Hendricks, SA
    Wassell, JT
    Collins, JW
    Sedlak, SL
    STATISTICS IN MEDICINE, 1996, 15 (17-18) : 1951 - 1960
  • [48] Model selection in the weighted generalized estimating equations for longitudinal data with dropout
    Gosho, Masahiko
    BIOMETRICAL JOURNAL, 2016, 58 (03) : 570 - 587
  • [49] The Robustness of Generalized Estimating Equations for Association Tests in Extended Family Data
    Suktitipat, Bhoom
    Mathias, Rasika A.
    Vaidya, Dhananjay
    Yanek, Lisa R.
    Young, J. Hunter
    Becker, Lewis C.
    Becker, Diane M.
    Wilson, Alexander F.
    Fallin, M. Danielle
    GENETIC EPIDEMIOLOGY, 2012, 36 (02) : 127 - 128
  • [50] Generalized estimating equations for ordinal data: A note on working correlation structures
    Lumley, T
    BIOMETRICS, 1996, 52 (01) : 354 - 361