Generalized Estimating Equations Boosting (GEEB) machine for correlated data

被引:0
|
作者
Wang, Yuan-Wey [1 ]
Yang, Hsin-Chou [2 ]
Chen, Yi-Hau [2 ]
Guo, Chao-Yu [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Inst Publ Hlth, Coll Med, Div Biostat & Data Sci, Taipei, Taiwan
[2] Acad Sinica, Inst Stat Sci, Taipei, Taiwan
关键词
Correlated data; Hierarchical data; Generalized Estimating Equations; Machine learning; Gradient boosting;
D O I
10.1186/s40537-023-00875-5
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Rapid development in data science enables machine learning and artificial intelligence to be the most popular research tools across various disciplines. While numerous articles have shown decent predictive ability, little research has examined the impact of complex correlated data. We aim to develop a more accurate model under repeated measures or hierarchical data structures. Therefore, this study proposes a novel algorithm, the Generalized Estimating Equations Boosting (GEEB) machine, to integrate the gradient boosting technique into the benchmark statistical approach that deals with the correlated data, the generalized Estimating Equations (GEE). Unlike the previous gradient boosting utilizing all input features, we randomly select some input features when building the model to reduce predictive errors. The simulation study evaluates the predictive performance of the GEEB, GEE, eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) across several hierarchical structures with different sample sizes. Results suggest that the new strategy GEEB outperforms the GEE and demonstrates superior predictive accuracy than the SVM and XGBoost in most situations. An application to a real-world dataset, the Forest Fire Data, also revealed that the GEEB reduced mean squared errors by 4.5% to 25% compared to GEE, XGBoost, and SVM. This research also provides a freely available R function that could implement the GEEB machine effortlessly for longitudinal or hierarchical data.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Generalized Estimating Equations Boosting (GEEB) machine for correlated data
    Yuan-Wey Wang
    Hsin-Chou Yang
    Yi-Hau Chen
    Chao-Yu Guo
    [J]. Journal of Big Data, 11
  • [2] Second-order generalized estimating equations for correlated count data
    Kalema, George
    Molenberghs, Geert
    Kassahun, Wondwosen
    [J]. COMPUTATIONAL STATISTICS, 2016, 31 (02) : 749 - 770
  • [3] Second-order generalized estimating equations for correlated count data
    George Kalema
    Geert Molenberghs
    Wondwosen Kassahun
    [J]. Computational Statistics, 2016, 31 : 749 - 770
  • [4] Statistical analysis of correlated data using generalized estimating equations: An orientation
    Hanley, JA
    Negassa, A
    Edwardes, MDD
    Forrester, JE
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2003, 157 (04) : 364 - 375
  • [5] Analyzing temporally correlated dolphin sightings data using generalized estimating equations
    Bailey, Helen
    Corkrey, Ross
    Cheney, Barbara
    Thompson, Paul M.
    [J]. MARINE MAMMAL SCIENCE, 2013, 29 (01) : 123 - 141
  • [6] Re: "Statistical analysis of correlated data using generalized estimating equations: An orientation"
    Zou, GY
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2003, 158 (03) : 289 - 289
  • [7] A GENERALIZED ESTIMATING EQUATIONS APPROACH FOR SPATIALLY CORRELATED BINARY DATA - APPLICATIONS TO THE ANALYSIS OF NEUROIMAGING DATA
    ALBERT, PS
    MCSHANE, LM
    [J]. BIOMETRICS, 1995, 51 (02) : 627 - 638
  • [8] GENERALIZED ESTIMATING EQUATIONS FOR CORRELATED BINARY DATA - USING THE ODDS RATIO AS A MEASURE OF ASSOCIATION
    LIPSITZ, SR
    LAIRD, NM
    HARRINGTON, DP
    [J]. BIOMETRIKA, 1991, 78 (01) : 153 - 160
  • [9] Generalized Estimating Equations for Genetic Association Studies of Multi-Correlated Longitudinal Family Data
    Karadag, Ozge
    Aktas, Serpil
    [J]. GAZI UNIVERSITY JOURNAL OF SCIENCE, 2018, 31 (01): : 273 - 280
  • [10] Spatially correlated binary data modelling using generalized estimating equations with alternative hypersphere decomposition
    Li, Junjie
    Pan, Jianxin
    [J]. SECOND INTERNATIONAL CONFERENCE ON PHYSICS, MATHEMATICS AND STATISTICS, 2019, 1324