Estimating Multilevel Models on Data Streams

被引:0
|
作者
L. Ippel
M. C. Kaptein
J. K. Vermunt
机构
[1] Maastricht University,Institute of Data Science
[2] Tilburg University,undefined
来源
Psychometrika | 2019年 / 84卷
关键词
Data streams; expectation maximization algorithm; multilevel models; machine (online) learning; SEMA; nested data;
D O I
暂无
中图分类号
学科分类号
摘要
Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or “row-by-row”). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.
引用
收藏
页码:41 / 64
页数:23
相关论文
共 50 条
  • [21] A Bayesian Approach for Estimating Multilevel Latent Contextual Models
    Zitzmann, Steffen
    Luedtke, Oliver
    Robitzsch, Alexander
    Marsh, Herbert W.
    STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2016, 23 (05) : 661 - 679
  • [22] Estimating binary multilevel models through indirect inference
    Mealli, F
    Rampichini, C
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1999, 29 (03) : 313 - 324
  • [23] Estimating effects of latent and measured genotypes in multilevel models
    van den Oord, EJCG
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2001, 10 (06) : 393 - 407
  • [24] Estimating the Frequency of Data Items in Massive Distributed Streams
    Anceaume, Emmanuelle
    Busnel, Yann
    Rivetti, Nicolo
    2015 IEEE 4TH SYMPOSIUM ON NETWORK CLOUD COMPUTING AND APPLICATIONS - NCCA 2015, 2015, : 59 - 66
  • [25] A Note on Estimating Hybrid Frequency Moment of Data Streams
    Ganguly, Sumit
    ALGORITHMIC ASPECTS IN INFORMATION AND MANAGEMENT, PROCEEDINGS, 2009, 5564 : 202 - 211
  • [26] Simpler algorithm for estimating frequency moments of data streams
    Bhuvanagiri, Lakshminath
    Ganguly, Sumit
    Kesh, Deepanjan
    Saha, Chandan
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 708 - 713
  • [27] Efficacy of depletion models for estimating abundance of endangered fishes in streams
    Stewart, David R.
    Butler, Matthew J.
    Johnson, Lacrecia A.
    Cajero, Aaron
    Young, Amber N.
    Harris, Grant M.
    FISHERIES RESEARCH, 2019, 209 : 208 - 217
  • [28] Multilevel and nonlinear panel data models
    Hübler O.
    Allgemeines Statistisches Archiv, 2006, 90 (1): : 121 - 136
  • [29] Multilevel models for estimating incremental net benefits in multinational studies
    Grieve, Richard
    Nixon, Richard
    Thompson, Simon G.
    Cairns, John
    HEALTH ECONOMICS, 2007, 16 (08) : 815 - 826
  • [30] Estimating hybrid frequency moments of data streams - Extended abstract
    Ganguly, Sumit
    Bansal, Mohit
    Dube, Shruti
    FRONTIERS IN ALGORITHMICS, 2008, 5059 : 55 - 66