Sampling and Sampling Frames in Big Data Epidemiology

被引:13
|
作者
Mooney, Stephen J. [1 ,2 ]
Garber, Michael D. [3 ]
机构
[1] Univ Washington, Dept Epidemiol, 1959 NE Pacific St,Hlth Sci Bldg,F-262,Box 357236, Seattle, WA 98195 USA
[2] Univ Washington, Harborview Injury Prevent & Res Ctr, Seattle, WA 98195 USA
[3] Emory Univ, Dept Epidemiol, Rollins Sch Publ Hlth, Atlanta, GA 30322 USA
关键词
Big data; Research methods; Sampling; Sampling frames; Secondary data; CAUSAL IDENTIFICATION; FOODBORNE ILLNESS; INFERENCE; DANGER; CHARGE; HEALTH;
D O I
10.1007/s40471-019-0179-y
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose of ReviewThe 'big data' revolution affords the opportunity to reuse administrative datasets for public health research. While such datasets offer dramatically increased statistical power compared with conventional primary data collection, typically at much lower cost, their use also raises substantial inferential challenges. In particular, it can be difficult to make population inferences because the sampling frames for many administrative datasets are undefined. We reviewed options for accounting for sampling in big data epidemiology.Recent FindingsWe identified three common strategies for accounting for sampling when the data available were not collected from a deliberately constructed sample: (1) explicitly reconstruct the sampling frame, (2) test the potential impacts of sampling using sensitivity analyses, and (3) limit inference to sample.SummaryInference from big data can be challenging because the impacts of sampling are unclear. Attention to sampling frames can minimize risks of bias.
引用
收藏
页码:14 / 22
页数:9
相关论文
共 50 条
  • [1] Sampling and Sampling Frames in Big Data Epidemiology
    Stephen J. Mooney
    Michael D. Garber
    [J]. Current Epidemiology Reports, 2019, 6 : 14 - 22
  • [2] Sampling for Big Data: A Tutorial
    Cormode, Graham
    Duffield, Nick
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1975 - 1975
  • [3] Sampling Operations on Big Data
    Gadepally, Vijay
    Herr, Taylor
    Johnson, Luke
    Milechin, Lauren
    Milosavljevic, Maja
    Miller, Benjamin A.
    [J]. 2015 49TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2015, : 1515 - 1519
  • [4] SAMPLING FRAMES
    BLUNDEN, RM
    [J]. JOURNAL OF THE MARKET RESEARCH SOCIETY, 1966, 8 (02): : 101 - 112
  • [5] Sampling Techniques for Big Data Analysis
    Kim, Jae Kwang
    Wang, Zhonglei
    [J]. INTERNATIONAL STATISTICAL REVIEW, 2019, 87 : S177 - S191
  • [6] Big Streaming Data Sampling and Optimization
    Kancharala, Abhilash
    Park, Nohjin
    Kim, Jongyeop
    Park, Nohpill
    [J]. IT CONVERGENCE AND SECURITY 2017, VOL 1, 2018, 449 : 218 - 228
  • [7] Sampling for Big Data Profiling: A Survey
    Liu, Zhicheng
    Zhang, Aoqian
    [J]. IEEE ACCESS, 2020, 8 : 72713 - 72726
  • [8] Intelligent Sampling for Big Data Using Bootstrap Sampling and Chebyshev Inequality
    Satyanarayana, Ashwin
    [J]. 2014 IEEE 27TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2014,
  • [9] Deep Learning and Data Sampling with Imbalanced Big Data
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    [J]. 2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 175 - 183
  • [10] Frames and sampling theorem
    孙文昌
    周性伟
    [J]. Science China Mathematics, 1998, (06) : 606 - 612