A framework for mediation analysis with massive data

被引:0
|
作者
Haixiang Zhang
Xin Li
机构
[1] Tianjin University,Center for Applied Mathematics
来源
Statistics and Computing | 2023年 / 33卷
关键词
Big data; Divide-and-conquer; Mediation effects; Structural equation modeling; Subsampled double bootstrap;
D O I
暂无
中图分类号
学科分类号
摘要
During the past few years, mediation analysis has gained increasing popularity across various research fields. The primary objective of mediation analysis is to examine the direct impact of exposure on outcome, as well as the indirect effects that occur along the pathways from exposure to outcome. There has been a great number of articles that applied mediation analysis to data from hundreds or thousands of individuals. With the rapid development of technology, the volume of avaliable data increases exponentially, which brings new challenges to researchers. Directly conducting statistical analysis for large datasets is often computationally infeasible. Nonetheless, there is a paucity of findings regarding mediation analysis in the context of big data. In this paper, we propose utilizing subsampled double bootstrap and divide-and-conquer algorithms to conduct statistical mediation analysis on large-scale datasets. The proposed algorithms offer a significant enhancement in computational efficiency over traditional bootstrap confidence interval and Sobel test, while simultaneously ensuring desirable confidence interval coverage and power. We conducted extensive numerical simulations to evaluate the performance of our method. The practical applicability of our approach is demonstrated through two real-world data examples.
引用
收藏
相关论文
共 50 条
  • [1] A framework for mediation analysis with massive data
    Zhang, Haixiang
    Li, Xin
    [J]. STATISTICS AND COMPUTING, 2023, 33 (04)
  • [2] A FRAMEWORK FOR MASSIVE TWITTER DATA EXTRACTION AND ANALYSIS
    AlvaroCuesta
    Barrero, David F.
    R-Moreno, Maria D.
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2014, 27 (01) : 50 - 67
  • [3] Faster: a low overhead framework for massive data analysis
    Santos, Matheus C.
    Meira, Wagner, Jr.
    Guedes, Dorgival
    Almeida, Virgilio F.
    [J]. 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 311 - 320
  • [4] Application of an analytical framework for multivariate mediation analysis of environmental data
    Max T. Aung
    Yanyi Song
    Kelly K. Ferguson
    David E. Cantonwine
    Lixia Zeng
    Thomas F. McElrath
    Subramaniam Pennathur
    John D. Meeker
    Bhramar Mukherjee
    [J]. Nature Communications, 11
  • [5] Application of an analytical framework for multivariate mediation analysis of environmental data
    Aung, Max T.
    Song, Yanyi
    Ferguson, Kelly K.
    Cantonwine, David E.
    Zeng, Lixia
    McElrath, Thomas F.
    Pennathur, Subramaniam
    Meeker, John D.
    Mukherjee, Bhramar
    [J]. NATURE COMMUNICATIONS, 2020, 11 (01)
  • [6] Multidimensional Analysis Framework on Massive Data of Observations of Daily Living
    Lu, Jianhua
    Zhang, Baili
    Wang, Xueyan
    Lu, Ningyun
    [J]. HEALTH INFORMATION SCIENCE (HIS 2017), 2017, 10594 : 121 - 127
  • [7] Analysis of Massive Industrial Data using MapReduce Framework for Parallel Processing
    Aly, Mohab
    Yacout, Soumaya
    Shaban, Yasser
    [J]. 2017 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2017,
  • [8] Performance analysis of Hoeffding trees in data streams by using massive online analysis framework
    Srimani, P. K.
    Patil, Malini M.
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2015, 7 (04) : 293 - 313
  • [9] Integrative modeling of multi-platform genomic data under the framework of mediation analysis
    Huang, Yen-Tsung
    [J]. STATISTICS IN MEDICINE, 2015, 34 (01) : 162 - 178
  • [10] A general framework for mining massive data streams
    Domingos, P
    Hulten, G
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2003, 12 (04) : 945 - 949