A framework for mediation analysis with massive data

被引:2
|
作者
Zhang, Haixiang [1 ]
Li, Xin [1 ]
机构
[1] Tianjin Univ, Ctr Appl Math, Tianjin, Peoples R China
关键词
Big data; Divide-and-conquer; Mediation effects; Structural equation modeling; Subsampled double bootstrap; CONFIDENCE-INTERVALS; MODELS; CAUSAL; BOOTSTRAP;
D O I
10.1007/s11222-023-10255-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
During the past few years, mediation analysis has gained increasing popularity across various research fields. The primary objective of mediation analysis is to examine the direct impact of exposure on outcome, as well as the indirect effects that occur along the pathways from exposure to outcome. There has been a great number of articles that applied mediation analysis to data from hundreds or thousands of individuals. With the rapid development of technology, the volume of avaliable data increases exponentially, which brings new challenges to researchers. Directly conducting statistical analysis for large datasets is often computationally infeasible. Nonetheless, there is a paucity of findings regarding mediation analysis in the context of big data. In this paper, we propose utilizing subsampled double bootstrap and divide-and-conquer algorithms to conduct statistical mediation analysis on large-scale datasets. The proposed algorithms offer a significant enhancement in computational efficiency over traditional bootstrap confidence interval and Sobel test, while simultaneously ensuring desirable confidence interval coverage and power. We conducted extensive numerical simulations to evaluate the performance of our method. The practical applicability of our approach is demonstrated through two real-world data examples.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] INTEGRATING INCOMPLETE DATA FOR MEDIATION ANALYSIS
    Derkach, Andriy
    Sampson, Joshua N.
    Pfeiffer, Ruth M.
    [J]. STATISTICA SINICA, 2024, 34 (02) : 1045 - 1066
  • [22] A Framework for Clustering Massive Text and Categorical Data Streams
    Aggarwal, Charu C.
    Yu, Philip S.
    [J]. PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 479 - 483
  • [23] A Framework for Classification and Segmentation of Massive Audio Data Streams
    Aggarwal, Charu C.
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 1013 - 1017
  • [24] A Framework for Clustering Massive-Domain Data Streams
    Aggarwal, Charu C.
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 102 - 113
  • [25] An ontology-based data mediation framework for semantic environments
    Mocan, Adrian
    Cimpian, Emilia
    [J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2007, 3 (02) : 69 - 98
  • [26] Discovery of Important Location from Massive Trajectory Data Based on Mediation Matrix
    Zhang, Xu
    Hu, Yongsen
    [J]. SOFTWARE ENGINEERING METHODS IN INTELLIGENT ALGORITHMS, VOL 1, 2019, 984 : 360 - 369
  • [27] Mediation Analysis in a Latent Growth Curve Modeling Framework
    von Soest, Tilmann
    Hagtvet, Knut A.
    [J]. STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2011, 18 (02) : 289 - 314
  • [28] A classical regression framework for mediation analysis: fitting one model to estimate mediation effects
    Saunders, Christina T.
    Blume, Jeffrey D.
    [J]. BIOSTATISTICS, 2018, 19 (04) : 514 - 528
  • [29] Storing massive Resource Description Framework (RDF) data: a survey
    Ma, Zongmin
    Capretz, Miriam A. M.
    Yan, Li
    [J]. KNOWLEDGE ENGINEERING REVIEW, 2016, 31 (04): : 391 - 413
  • [30] A Demonstration of Summit: a Scalable Data Management Framework for Massive Trajectory
    Alarabi, Louai
    Mokbel, Mohamed F.
    [J]. 2020 21ST IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2020), 2020, : 226 - 227