Sampling Techniques for Big Data Analysis

被引:33
|
作者
Kim, Jae Kwang [1 ]
Wang, Zhonglei [2 ]
机构
[1] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
[2] Xiamen Univ, Sch Econ, Wang Yanan Inst Studies Econ WISE, Xiamen 361005, Fujian, Peoples R China
基金
美国国家科学基金会;
关键词
Data integration; inverse sampling; non-probability sample; selection bias; VARIANCE-ESTIMATION; MISSING DATA; INFERENCE; NONRESPONSE; IMPUTATION;
D O I
10.1111/insr.12290
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In analysing big data for finite population inference, it is critical to adjust for the selection bias in the big data. In this paper, we propose two methods of reducing the selection bias associated with the big data sample. The first method uses a version of inverse sampling by incorporating auxiliary information from external sources, and the second one borrows the idea of data integration by combining the big data sample with an independent probability sample. Two simulation studies show that the proposed methods are unbiased and have better coverage rates than their alternatives. In addition, the proposed methods are easy to implement in practice.
引用
收藏
页码:S177 / S191
页数:15
相关论文
共 50 条
  • [1] Sampling Techniques to Improve Big Data Exploration
    Rojas, Julian A. Ramos
    Kery, Mary Beth
    Rosenthal, Stephanie
    Dey, Anind
    [J]. 2017 IEEE 7TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV), 2017, : 26 - 35
  • [2] Analysis on Big Data Techniques
    Harleen
    Garg, Naveen
    [J]. INTERNATIONAL PROCEEDINGS ON ADVANCES IN SOFT COMPUTING, INTELLIGENT SYSTEMS AND APPLICATIONS, ASISA 2016, 2018, 628 : 375 - 391
  • [3] Big data analysis techniques.
    St-Pierre, N.
    [J]. JOURNAL OF ANIMAL SCIENCE, 2016, 94 : 624 - 624
  • [4] A Survey of Clustering Techniques for Big Data Analysis
    Arora, Saurabh
    Chana, Inderveer
    [J]. 2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 59 - 65
  • [5] Analysis of Dimensionality Reduction Techniques on Big Data
    Reddy, G. Thippa
    Reddy, M. Praveen Kumar
    Lakshmanna, Kuruva
    Kaluri, Rajesh
    Rajput, Dharmendra Singh
    Srivastava, Gautam
    Baker, Thar
    [J]. IEEE ACCESS, 2020, 8 : 54776 - 54788
  • [6] Comparative Analysis on Techniques for Big Data Testing
    Abidin, Adiba
    Lal, Divya
    Garg, Naveen
    Deep, Vikas
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCITE) - NEXT GENERATION IT SUMMIT ON THE THEME - INTERNET OF THINGS: CONNECT YOUR WORLDS, 2016,
  • [7] Big data analysis techniques for intelligent systems
    Farouk, Ahmed
    Zhen, Dou
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (03) : 3067 - 3071
  • [8] Some techniques for the analysis of work sampling data
    Miller, ME
    James, MK
    Langefeld, CD
    Espeland, MA
    Freedman, JA
    Martin, DK
    Smith, DM
    [J]. STATISTICS IN MEDICINE, 1996, 15 (06) : 607 - 618
  • [9] A Survey of Data Partitioning and Sampling Methods to Support Big Data Analysis
    Mahmud, Mohammad Sultan
    Huang, Joshua Zhexue
    Salloum, Salman
    Emara, Tamer Z.
    Sadatdiynov, Kuanishbay
    [J]. BIG DATA MINING AND ANALYTICS, 2020, 3 (02) : 85 - 101
  • [10] A Survey of Data Partitioning and Sampling Methods to Support Big Data Analysis
    Mohammad Sultan Mahmud
    Joshua Zhexue Huang
    Salman Salloum
    Tamer Z.Emara
    Kuanishbay Sadatdiynov
    [J]. Big Data Mining and Analytics, 2020, 3 (02) : 85 - 101