SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics

被引:0
|
作者
Min Li
Jian Tan
Yandong Wang
Li Zhang
Valentina Salapura
机构
[1] IBM Almaden Research Center,
[2] Ohio State University,undefined
来源
Cluster Computing | 2017年 / 20卷
关键词
Benchmark; Spark; Workload characterization; Big data analytics;
D O I
暂无
中图分类号
学科分类号
摘要
Spark has been increasingly employed by industries for big data analytics recently, due to its resilience, scalability and efficient in-memory distributed programming model. Meanwhile, the rapid growing community is also actively incubating a rich ecosystem around Spark to tackle various big data challenges. The current benchmarks fall short in providing guidance of development, optimization, configuration and deployment of Spark. To this end, we introduce SparkBench, a Spark specific benchmarking suite. It selectively embraces a set of representative applications to identify various performance bottlenecks and reveals the resource consumption behaviors across execution phases. Overall, SparkBench covers four critical usage patterns of Spark, including machine learning, graph processing, stream computations and SQL query processing. We present comprehensive characterization of resource consumptions, data flows and timing information under different execution patterns and demonstrate that SparkBench can effectively guide the optimization of data analytic platforms to better suit for various workloads.
引用
收藏
页码:2575 / 2589
页数:14
相关论文
共 50 条
  • [1] SPARKBENCH: a spark benchmarking suite characterizing large-scale in-memory data analytics
    Li, Min
    Tan, Jian
    Wang, Yandong
    Zhang, Li
    Salapura, Valentina
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (03): : 2575 - 2589
  • [2] A Performance Study on Large-Scale Data Analytics Using Disk-Based and In-Memory Database Systems
    Chao, Pingfu
    He, Dan
    Sadiq, Shazia
    Zheng, Kai
    Zhou, Xiaofang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 247 - 254
  • [3] In-Memory Distributed Indexing for Large-Scale Media Data Retrieval
    Ma, Yinmiao
    Liu, Danlu
    Scott, Grant
    Uhlmann, Jeffrey
    Shyu, Chi-Ren
    [J]. 2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 232 - 239
  • [4] On the Implications of Heterogeneous Memory Tiering on Spark In-Memory Analytics
    Katsaragakis, Manolis
    Masouros, Dimosthenis
    Papadopoulos, Lazaros
    Catthoor, Francky
    Soudris, Dimitrios
    [J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 945 - 952
  • [5] YinMem: a distributed parallel indexed in-memory computation system for large scale data analytics
    Huang, Yin
    Yesha, Yelena
    Halem, Milton
    Yesha, Yaacov
    Zhou, Shujia
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 214 - 222
  • [6] SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data
    Xu, Zhichao
    Chen, Wei
    Gai, Lei
    Wang, Tengjiao
    [J]. WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 : 337 - 349
  • [7] BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data
    Kaplan, Roman
    Yavits, Leonid
    Ginosasr, Ran
    [J]. PROCEEDINGS OF THE 13TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE (SYSTOR 2020), 2020, : 36 - 48
  • [8] Characterizing large-scale quantum computers via cycle benchmarking
    Alexander Erhard
    Joel J. Wallman
    Lukas Postler
    Michael Meth
    Roman Stricker
    Esteban A. Martinez
    Philipp Schindler
    Thomas Monz
    Joseph Emerson
    Rainer Blatt
    [J]. Nature Communications, 10
  • [9] Characterizing large-scale quantum computers via cycle benchmarking
    Erhard, Alexander
    Wallman, Joel J.
    Postler, Lukas
    Meth, Michael
    Stricker, Roman
    Martinez, Esteban A.
    Schindler, Philipp
    Monz, Thomas
    Emerson, Joseph
    Blatt, Rainer
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [10] Eager Memory Management for In-Memory Data Analytics
    Jang, Hakbeom
    Bae, Jonghyun
    Ham, Tae Jun
    Lee, Jae W.
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (03): : 632 - 636