Estimating Multilevel Models on Data Streams

被引:0
|
作者
L. Ippel
M. C. Kaptein
J. K. Vermunt
机构
[1] Maastricht University,Institute of Data Science
[2] Tilburg University,undefined
来源
Psychometrika | 2019年 / 84卷
关键词
Data streams; expectation maximization algorithm; multilevel models; machine (online) learning; SEMA; nested data;
D O I
暂无
中图分类号
学科分类号
摘要
Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or “row-by-row”). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.
引用
收藏
页码:41 / 64
页数:23
相关论文
共 50 条
  • [31] Estimating autotrophic respiration in streams using daily metabolism data
    Hall, Robert O., Jr.
    Beaulieu, Jake J.
    FRESHWATER SCIENCE, 2013, 32 (02) : 507 - 516
  • [32] MULTILEVEL SECURE RULES - INTEGRATING THE MULTILEVEL SECURE AND ACTIVE DATA MODELS
    SMITH, K
    WINSLETT, M
    IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1993, 21 : 35 - 53
  • [33] MULTILEVEL DATA-STRUCTURES - MODELS AND PERFORMANCE
    MOITRA, A
    IYENGAR, SS
    BASTANI, FB
    YEN, IL
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1988, 14 (06) : 858 - 867
  • [34] MULTILEVEL AUTOREGRESSIVE MODELS FOR LONGITUDINAL DYADIC DATA
    Gistelinck, Fien
    Loeys, Tom
    TPM-TESTING PSYCHOMETRICS METHODOLOGY IN APPLIED PSYCHOLOGY, 2020, 27 (03) : 433 - 452
  • [35] Multilevel mixed linear models for survival data
    Ha I.D.
    Lee Y.
    Lifetime Data Analysis, 2005, 11 (1) : 131 - 142
  • [36] Covariate selection for multilevel models with missing data
    Marino, Miguel
    Buxton, Orfeu M.
    Li, Yi
    STAT, 2017, 6 (01): : 31 - 46
  • [37] Multilevel statistical models and the analysis of experimental data
    Behm, Jocelyn E.
    Edmonds, Devin A.
    Harmon, Jason P.
    Ives, Anthony R.
    ECOLOGY, 2013, 94 (07) : 1479 - 1486
  • [38] Testing procedures for multilevel models with administrative data
    Vittadini, Giorgio
    Sanarico, Maurizio
    Berta, Paolo
    DATA ANALYSIS, CLASSIFICATION AND THE FORWARD SEARCH, 2006, : 329 - +
  • [39] Marginal structural models for multilevel clustered data
    Wu, Yujie
    Langworthy, Benjamin
    Wang, Molin
    BIOSTATISTICS, 2022, 23 (04) : 1056 - 1073
  • [40] Partitioning Variation in Multilevel Models for Count Data
    Leckie, George
    Browne, William J.
    Goldstein, Harvey
    Merlo, Juan
    Austin, Peter C.
    PSYCHOLOGICAL METHODS, 2020, 25 (06) : 787 - 801