Sampling and Sampling Frames in Big Data Epidemiology

被引:13
|
作者
Mooney, Stephen J. [1 ,2 ]
Garber, Michael D. [3 ]
机构
[1] Univ Washington, Dept Epidemiol, 1959 NE Pacific St,Hlth Sci Bldg,F-262,Box 357236, Seattle, WA 98195 USA
[2] Univ Washington, Harborview Injury Prevent & Res Ctr, Seattle, WA 98195 USA
[3] Emory Univ, Dept Epidemiol, Rollins Sch Publ Hlth, Atlanta, GA 30322 USA
关键词
Big data; Research methods; Sampling; Sampling frames; Secondary data; CAUSAL IDENTIFICATION; FOODBORNE ILLNESS; INFERENCE; DANGER; CHARGE; HEALTH;
D O I
10.1007/s40471-019-0179-y
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose of ReviewThe 'big data' revolution affords the opportunity to reuse administrative datasets for public health research. While such datasets offer dramatically increased statistical power compared with conventional primary data collection, typically at much lower cost, their use also raises substantial inferential challenges. In particular, it can be difficult to make population inferences because the sampling frames for many administrative datasets are undefined. We reviewed options for accounting for sampling in big data epidemiology.Recent FindingsWe identified three common strategies for accounting for sampling when the data available were not collected from a deliberately constructed sample: (1) explicitly reconstruct the sampling frame, (2) test the potential impacts of sampling using sensitivity analyses, and (3) limit inference to sample.SummaryInference from big data can be challenging because the impacts of sampling are unclear. Attention to sampling frames can minimize risks of bias.
引用
收藏
页码:14 / 22
页数:9
相关论文
共 50 条
  • [41] Iterative sampling based frequent itemset mining for big data
    Wu, Xian
    Fan, Wei
    Peng, Jing
    Zhang, Kun
    Yu, Yong
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2015, 6 (06) : 875 - 882
  • [42] A proposal to deal with sampling bias in social network big data
    Iacus, Stefano Maria
    Porro, Giuseepe
    Salini, Silvia
    Siletti, Elena
    [J]. 2ND INTERNATIONAL CONFERENCE ON ADVANCED RESEARCH METHODS AND ANALYTICS (CARMA 2018), 2018, : 29 - 37
  • [43] Error-bounded Sampling for Analytics on Big Sparse Data
    Yan, Ying
    Chen, Liang Jeff
    Zhang, Zheng
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (13): : 1508 - 1519
  • [44] Application of Compressed Sampling to Overcome Big Data Issues in Synchrophasors
    Aravind, M. N.
    Anju, L. S.
    Sunitha, R.
    [J]. 2016 IEEE 6TH INTERNATIONAL CONFERENCE ON POWER SYSTEMS (ICPS), 2016,
  • [45] A Scalable Adaptive Sampling Based Approach for Big Data Classification
    Djouzi, Kheyreddine
    Beghdad-Bey, Kadda
    Amamra, Abdenour
    [J]. ADVANCES IN COMPUTING SYSTEMS AND APPLICATIONS, 2022, 513 : 73 - 83
  • [46] Centrality Clustering-Based Sampling for Big Data Visualization
    Tam Thanh Nguyen
    Song, Insu
    [J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1911 - 1917
  • [47] Iterative sampling based frequent itemset mining for big data
    Xian Wu
    Wei Fan
    Jing Peng
    Kun Zhang
    Yong Yu
    [J]. International Journal of Machine Learning and Cybernetics, 2015, 6 : 875 - 882
  • [48] CDFRS: A scalable sampling approach for efficient big data analysis
    Cai, Yongda
    Wu, Dingming
    Sun, Xudong
    Wu, Siyue
    Xu, Jingsheng
    Huang, Joshua Zhexue
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
  • [49] Towards insight-driven sampling for big data visualisation
    Masiane, Moeti M.
    Driscoll, Anne
    Feng, Wuchun
    Wenskovitch, John
    North, Chris
    [J]. BEHAVIOUR & INFORMATION TECHNOLOGY, 2020, 39 (07) : 788 - 807
  • [50] Sampling Big Trajectory Data for Traversal Trajectory Aggregate Query
    Ding, Yichen
    Li, Yanhua
    Zhou, Xun
    Huang, Zhuojie
    You, Simin
    Luo, Jun
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (04) : 550 - 563