Characterization of a Big Data Storage Workload in the Cloud

被引:6
|
作者
Talluri, Sacheendra [1 ,5 ]
Luszczak, Alicja [2 ]
Abad, Cristina L. [3 ]
Iosup, Alexandru [4 ]
机构
[1] Delft Univ Technol, Delft, Netherlands
[2] Databricks BV, Amsterdam, Netherlands
[3] Escuela Super Politecn Litoral, Guayaquil, Ecuador
[4] Vrije Univ, Amsterdam, Netherlands
[5] Databricks, Amsterdam, Netherlands
关键词
D O I
10.1145/3297663.3310302
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The proliferation of big data processing platforms has led to radically different system designs, such as MapReduce and the newer Spark. Understanding the workloads of such systems facilitates tuning and could foster new designs. However, whereas MapReduce workloads have been characterized extensively, relatively little public knowledge exists about the characteristics of Spark workloads in representative environments. To address this problem, in this work we collect and analyze a 6-month Spark workload from a major provider of big data processing services, Databricks. Our analysis focuses on a number of key features, such as the long-term trends of reads and modifications, the statistical properties of reads, and the popularity of clusters and of file formats. Overall, we present numerous findings that could form the basis of new systems studies and designs. Our quantitative evidence and its analysis suggest the existence of daily and weekly load imbalances, of heavy-tailed and bursty behaviour, of the relative rarity of modifications, and of proliferation of big data specific formats.
引用
收藏
页码:33 / 44
页数:12
相关论文
共 50 条
  • [1] Storage Consideration for Big Data in the Cloud
    Hsu, Yun-Ping
    2016 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2016,
  • [2] Workload Characterization of Interactive Cloud Services on Big and Small Server Platforms
    Chen, Shuang
    GalOn, Shay
    Delimitrou, Christina
    Manne, Srilatha
    Martinez, Jose F.
    PROCEEDINGS OF THE 2017 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2017, : 125 - 134
  • [3] Optimal Workload and Energy Storage Management for Cloud Data Centers
    Guo, Yuanxiong
    Fang, Yuguang
    Khargonekar, Pramod P.
    2013 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2013), 2013, : 1850 - 1855
  • [4] Cloud Workload Characterization
    Mulia, Wira D.
    Sehgal, Naresh
    Sohoni, Sohum
    Acken, John M.
    Stanberry, C. Lucas
    Fritz, David J.
    IETE TECHNICAL REVIEW, 2013, 30 (05) : 382 - 397
  • [5] A model to compare cloud and non-cloud storage of Big Data
    Chang, Victor
    Wills, Gary
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 57 : 56 - 76
  • [6] Big Data Storage in the Cloud for Smart Environment Monitoring
    Fazio, M.
    Celesti, A.
    Puliafito, A.
    Villari, M.
    6TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2015), THE 5TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2015), 2015, 52 : 500 - 506
  • [7] Research on the Cloud Storage Security in Big Data Era
    Chen Kai
    Lang Weimin
    Zheng Ke
    Ouyang Wenjing
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING INNOVATION, 2015, 12 : 659 - 664
  • [8] Big Data Storage Architecture Design in Cloud Computing
    Chen, Xuebin
    Wang, Shi
    Dong, Yanyan
    Wang, Xu
    BIG DATA TECHNOLOGY AND APPLICATIONS, 2016, 590 : 7 - 14
  • [9] Efficient and Secure Cloud Storage for Handling Big Data
    Kumar, Arjun
    Lee, HoonJae
    Singh, Rajeev Pratap
    2012 6TH INTERNATIONAL CONFERENCE ON NEW TRENDS IN INFORMATION SCIENCE, SERVICE SCIENCE AND DATA MINING (ISSDM2012), 2012, : 162 - 166
  • [10] Study on Cloud Storage based on the MapReduce for Big Data
    Huang Yi
    Ma Xinqiang
    Zhang Yongdan
    Liu Youyuan
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON MECHATRONICS, ELECTRONIC, INDUSTRIAL AND CONTROL ENGINEERING, 2015, 8 : 1601 - 1605