Estimating Multilevel Models on Data Streams

被引:0
|
作者
L. Ippel
M. C. Kaptein
J. K. Vermunt
机构
[1] Maastricht University,Institute of Data Science
[2] Tilburg University,undefined
来源
Psychometrika | 2019年 / 84卷
关键词
Data streams; expectation maximization algorithm; multilevel models; machine (online) learning; SEMA; nested data;
D O I
暂无
中图分类号
学科分类号
摘要
Social scientists are often faced with data that have a nested structure: pupils are nested within schools, employees are nested within companies, or repeated measurements are nested within individuals. Nested data are typically analyzed using multilevel models. However, when data sets are extremely large or when new data continuously augment the data set, estimating multilevel models can be challenging: the current algorithms used to fit multilevel models repeatedly revisit all data points and end up consuming much time and computer memory. This is especially troublesome when predictions are needed in real time and observations keep streaming in. We address this problem by introducing the Streaming Expectation Maximization Approximation (SEMA) algorithm for fitting multilevel models online (or “row-by-row”). In an extensive simulation study, we demonstrate the performance of SEMA compared to traditional methods of fitting multilevel models. Next, SEMA is used to analyze an empirical data stream. The accuracy of SEMA is competitive to current state-of-the-art methods while being orders of magnitude faster.
引用
收藏
页码:41 / 64
页数:23
相关论文
共 50 条
  • [41] Multilevel mixed linear models for survival data
    Ha, ID
    Lee, Y
    LIFETIME DATA ANALYSIS, 2005, 11 (01) : 131 - 142
  • [42] The mixed or multilevel models for longitudinal sibling data
    Guo, G
    Wang, JM
    BEHAVIOR GENETICS, 2004, 34 (06) : 643 - 643
  • [43] ESTIMATING RICARDIAN MODELS WITH PANEL DATA
    Massetti, Emanuele
    Mendelsohn, Robert
    CLIMATE CHANGE ECONOMICS, 2011, 2 (04) : 301 - 319
  • [44] Estimating GEV models with censored data
    Newman, Jeffrey P.
    Ferguson, Mark E.
    Garrow, Laurie A.
    TRANSPORTATION RESEARCH PART B-METHODOLOGICAL, 2013, 58 : 170 - 184
  • [45] Estimating panel data duration models with censored data
    Lee, Sokbae
    ECONOMETRIC THEORY, 2008, 24 (05) : 1254 - 1276
  • [46] Meta-Sketch: A Neural Data Structure for Estimating Item Frequencies of Data Streams
    Cao, Yukun
    Feng, Yuan
    Xie, Xike
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6916 - +
  • [47] Estimating frequency moments of data streams using random linear combinations
    Ganguly, S
    APPROXIMATION, RANDOMIZATION, AND COMBINATORIAL OPTIMIZATION: ALGORITHMS AND TECHNIQUES, PROCEEDINGS, 2004, 3122 : 369 - 380
  • [48] Hierarchical Sampling from Sketches: Estimating Functions over Data Streams
    Sumit Ganguly
    Lakshminath Bhuvanagiri
    Algorithmica, 2009, 53 : 549 - 582
  • [49] Hierarchical Sampling from Sketches: Estimating Functions over Data Streams
    Ganguly, Sumit
    Bhuvanagiri, Lakshminath
    ALGORITHMICA, 2009, 53 (04) : 549 - 582
  • [50] Universal and Accurate Sketch for Estimating Heavy Hitters and Moments in Data Streams
    Xiao, Qingjun
    Cai, Xuyuan
    Qin, Yifei
    Tang, Zhiying
    Chen, Shigang
    Liu, Yu
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (05) : 1919 - 1934