RE-EM trees: a data mining approach for longitudinal and clustered data

被引:0
|
作者
Rebecca J. Sela
Jeffrey S. Simonoff
机构
[1] J.P. Morgan Chase & Co.,Statistics Group, Information, Operations, and Management Sciences Department, Leonard N. Stern School of Business
[2] New York University,undefined
来源
Machine Learning | 2012年 / 86卷
关键词
Clustered data; Longitudinal data; Panel data; Mixed effects model; Random effects; Regression tree; CART;
D O I
暂无
中图分类号
学科分类号
摘要
Longitudinal data refer to the situation where repeated observations are available for each sampled object. Clustered data, where observations are nested in a hierarchical structure within objects (without time necessarily being involved) represent a similar type of situation. Methodologies that take this structure into account allow for the possibilities of systematic differences between objects that are not related to attributes and autocorrelation within objects across time periods. A standard methodology in the statistics literature for this type of data is the mixed effects model, where these differences between objects are represented by so-called “random effects” that are estimated from the data (population-level relationships are termed “fixed effects,” together resulting in a mixed effects model). This paper presents a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods. We apply the resulting estimation method, called the RE-EM tree, to pricing in online transactions, showing that the RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. We also apply it to a smaller data set examining accident fatalities, and show that the RE-EM tree strongly outperforms a tree without random effects while performing comparably to a linear model with random effects. We also perform extensive simulation experiments to show that the estimator improves predictive performance relative to regression trees without random effects and is comparable or superior to using linear models with random effects in more general situations.
引用
收藏
页码:169 / 207
页数:38
相关论文
共 50 条
  • [1] RE-EM trees: a data mining approach for longitudinal and clustered data
    Sela, Rebecca J.
    Simonoff, Jeffrey S.
    [J]. MACHINE LEARNING, 2012, 86 (02) : 169 - 207
  • [2] Unbiased regression trees for longitudinal and clustered data
    Fu, Wei
    Simonoff, Jeffrey S.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 88 : 53 - 74
  • [3] A simple approach to analyzing clustered longitudinal data
    Stephenson, Matthew
    Ali, R. Ayesha
    Darlington, Gerarda A.
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (05) : 3553 - 3562
  • [4] EM for Mixture of Linear Regression with Clustered Data
    Reisizadeh, Amirhossein
    Gatmiry, Khashayar
    Ozdaglar, Asuman
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [5] A data mining architecture for clustered environments
    Ashrafi, MZ
    Taniar, D
    Smith, KA
    [J]. APPLIED PARALLEL COMPUTING: ADVANCED SCIENTIFIC COMPUTING, 2002, 2367 : 89 - 98
  • [6] A data mining architecture for clustered environments
    Ashrafi, MZ
    Taniar, D
    Smith, KA
    [J]. APPLIED PARALLEL COMPUTING: ADVANCED SCIENTIFIC COMPUTING, 2002, 2367 : 89 - 98
  • [7] Learning decision trees from uncertain data with an evidential EM approach
    Sutton-Charani, Nicolas
    Destercke, Sebastien
    Denoeux, Thierry
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 111 - 116
  • [8] Mixed effects regression trees for clustered data
    Hajjem, Ahlem
    Bellavance, Francois
    Larocque, Denis
    [J]. STATISTICS & PROBABILITY LETTERS, 2011, 81 (04) : 451 - 459
  • [9] Data mining methods with trees
    Zambochova, Marta
    [J]. E & M EKONOMIE A MANAGEMENT, 2008, 11 (01): : 126 - 131
  • [10] Data Mining for Longitudinal Data with Different Treatments
    Akacha, Mouna
    Fonseca, Thais C. O.
    Liverani, Silvia
    [J]. NEW PERSPECTIVES IN STATISTICAL MODELING AND DATA ANALYSIS, 2011, : 409 - 416