Sampling Operations on Big Data

被引:0
|
作者
Gadepally, Vijay [1 ]
Herr, Taylor [1 ]
Johnson, Luke [1 ]
Milechin, Lauren [1 ]
Milosavljevic, Maja [1 ]
Miller, Benjamin A. [1 ]
机构
[1] MIT, Lincoln Lab, Lexington, MA 02420 USA
关键词
LINK PREDICTION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The 3Vs - Volume, Velocity and Variety - of Big Data continues to be a large challenge for systems and algorithms designed to store, process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and filtering, which have been used for decades to compress signals are often undefined in data that is characterized by heterogeneity, high dimensionality, and lack of known structure. In this article, we describe and demonstrate an approach to sample large datasets such as social media data. We evaluate the effect of sampling on a common predictive analytic: link prediction. Our results indicate that greatly sampling a dataset can still yield meaningful link prediction results.
引用
收藏
页码:1515 / 1519
页数:5
相关论文
共 50 条
  • [1] Big Data and Service Operations
    Cohen, Maxime C.
    [J]. PRODUCTION AND OPERATIONS MANAGEMENT, 2018, 27 (09) : 1709 - 1723
  • [2] Sampling and Sampling Frames in Big Data Epidemiology
    Mooney, Stephen J.
    Garber, Michael D.
    [J]. CURRENT EPIDEMIOLOGY REPORTS, 2019, 6 (01) : 14 - 22
  • [3] Sampling and Sampling Frames in Big Data Epidemiology
    Stephen J. Mooney
    Michael D. Garber
    [J]. Current Epidemiology Reports, 2019, 6 : 14 - 22
  • [4] Sampling for Big Data: A Tutorial
    Cormode, Graham
    Duffield, Nick
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1975 - 1975
  • [5] BIG DATA IN DAILY MANUFACTURING OPERATIONS
    Wilschut, Tim
    Adan, Ivo J. B. F.
    Stokkermans, Joep
    [J]. PROCEEDINGS OF THE 2014 WINTER SIMULATION CONFERENCE (WSC), 2014, : 2364 - 2375
  • [6] Big Data Analytics in Operations Management
    Choi, Tsan-Ming
    Wallace, Stein W.
    Wang, Yulan
    [J]. PRODUCTION AND OPERATIONS MANAGEMENT, 2018, 27 (10) : 1868 - 1883
  • [7] Sampling Techniques for Big Data Analysis
    Kim, Jae Kwang
    Wang, Zhonglei
    [J]. INTERNATIONAL STATISTICAL REVIEW, 2019, 87 : S177 - S191
  • [8] Big Streaming Data Sampling and Optimization
    Kancharala, Abhilash
    Park, Nohjin
    Kim, Jongyeop
    Park, Nohpill
    [J]. IT CONVERGENCE AND SECURITY 2017, VOL 1, 2018, 449 : 218 - 228
  • [9] Sampling for Big Data Profiling: A Survey
    Liu, Zhicheng
    Zhang, Aoqian
    [J]. IEEE ACCESS, 2020, 8 : 72713 - 72726
  • [10] Deep Learning and Data Sampling with Imbalanced Big Data
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    [J]. 2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 175 - 183