DSPBench: A Suite of Benchmark Applications for Distributed Data Stream Processing Systems

被引:19
|
作者
Bordin, Maycon Viana [1 ]
Griebler, Dalvan [2 ,3 ]
Mencagli, Gabriele [4 ]
Geyer, Claudio F. R. [1 ]
Fernandes, Luiz Gustavo L. [2 ]
机构
[1] Fed Univ Rio Grande Sul UFRGS, Inst Informat, BR-91509900 Porto Alegre, RS, Brazil
[2] Pontifcal Catholic Univ Rio Grande Sul PUCRS, Sch Technol, BR-90619900 Porto Alegre, RS, Brazil
[3] SETREM, Tres De Maio Fac, Lab Adv Res Cloud Comp LARCC, BR-98910000 Tres De Maio, Brazil
[4] Univ Pisa, Dept Comp Sci, I-56127 Pisa, Italy
基金
欧盟地平线“2020”;
关键词
Benchmark testing; Storms; Sparks; Task analysis; Throughput; Distributed databases; Tools; Data stream processing; big data; benchmarking; apache storm; spark streaming; LANGUAGE; MODEL;
D O I
10.1109/ACCESS.2020.3043948
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distributed computing environments like clusters and clouds. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Some benchmark applications have often been used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs. However, the existing benchmark suites lack of representative workloads coming from the wide set of application domains that can leverage the benefits offered by the stream processing paradigm in terms of near real-time performance. The goal of this article is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunications, Sensor Networks, Social Networks and others. This article describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation. In addition, it exemplifies the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.
引用
收藏
页码:222900 / 222917
页数:18
相关论文
共 50 条
  • [1] A Performance Benchmark for NetFlow Data Analysis on Distributed Stream Processing Systems
    Cermak, Milan
    Tovarnak, Daniel
    Lastovicka, Martin
    Celeda, Pavel
    NOMS 2016 - 2016 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2016, : 919 - 924
  • [2] RIoTBench: An IoT benchmark for distributed stream processing systems
    Shukla, Anshu
    Chaturvedi, Shilpa
    Simmhan, Yogesh
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (21):
  • [3] A benchmark suite for unstructured data processing
    Smullen, Clinton Wills
    Tarapore, Shahrukh Rohinton
    Gurumurthi, Sudhanva
    SNAPI 2007: FOURTH INTERNATIONAL WORKSHOP ON STORAGE NETWORK ARCHITECTURE AND PARALLEL I/OS, PROCEEDINGS, 2007, : 79 - 83
  • [4] Reliable stream data processing for elastic distributed stream processing systems
    Xiaohui Wei
    Yuan Zhuang
    Hongliang Li
    Zhiliang Liu
    Cluster Computing, 2020, 23 : 555 - 574
  • [5] Reliable stream data processing for elastic distributed stream processing systems
    Wei, Xiaohui
    Zhuang, Yuan
    Li, Hongliang
    Liu, Zhiliang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (02): : 555 - 574
  • [6] Benchmarking Distributed Stream Data Processing Systems
    Karimov, Jeyhun
    Rabl, Tilmann
    Katsifodimos, Asterios
    Samarev, Roman
    Heiskanen, Henri
    Markl, Volker
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1507 - 1518
  • [7] Tracing Distributed Data Stream Processing Systems
    Zvara, Zoltan
    Szabo, Peter G. N.
    Hermann, Gabor
    Benczur, Andras
    2017 IEEE 2ND INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2017, : 235 - 242
  • [8] Resource Estimation in Distributed Data Stream Processing Systems
    Fan, Minglu
    Liang, Yi
    Liu, Fei
    Yang, Mangmang
    Wang, Haihua
    PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 1824 - 1827
  • [9] Efficient Operator Placement for Distributed Data Stream Processing Applications
    Nardelli, Matteo
    Cardellini, Valeria
    Grassi, Vincenzo
    Lo Presti, Francesco
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (08) : 1753 - 1767
  • [10] An Evaluation of Data Stream Processing Systems for Data Driven Applications
    Samosir, Jonathan
    Indrawan-Santiago, Maria
    Haghighi, Pari Delir
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 439 - 449