DSPBench: A Suite of Benchmark Applications for Distributed Data Stream Processing Systems

被引:19
|
作者
Bordin, Maycon Viana [1 ]
Griebler, Dalvan [2 ,3 ]
Mencagli, Gabriele [4 ]
Geyer, Claudio F. R. [1 ]
Fernandes, Luiz Gustavo L. [2 ]
机构
[1] Fed Univ Rio Grande Sul UFRGS, Inst Informat, BR-91509900 Porto Alegre, RS, Brazil
[2] Pontifcal Catholic Univ Rio Grande Sul PUCRS, Sch Technol, BR-90619900 Porto Alegre, RS, Brazil
[3] SETREM, Tres De Maio Fac, Lab Adv Res Cloud Comp LARCC, BR-98910000 Tres De Maio, Brazil
[4] Univ Pisa, Dept Comp Sci, I-56127 Pisa, Italy
基金
欧盟地平线“2020”;
关键词
Benchmark testing; Storms; Sparks; Task analysis; Throughput; Distributed databases; Tools; Data stream processing; big data; benchmarking; apache storm; spark streaming; LANGUAGE; MODEL;
D O I
10.1109/ACCESS.2020.3043948
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distributed computing environments like clusters and clouds. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Some benchmark applications have often been used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs. However, the existing benchmark suites lack of representative workloads coming from the wide set of application domains that can leverage the benefits offered by the stream processing paradigm in terms of near real-time performance. The goal of this article is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunications, Sensor Networks, Social Networks and others. This article describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation. In addition, it exemplifies the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.
引用
收藏
页码:222900 / 222917
页数:18
相关论文
共 50 条
  • [41] Tools and strategies for debugging distributed stream processing applications
    Gedik, Bugra
    Andrade, Henrique
    Frenkiel, Andy
    De Pauw, Wim
    Pfeifer, Michael
    Allen, Paul
    Cohen, Norman
    Wu, Kun-Lung
    SOFTWARE-PRACTICE & EXPERIENCE, 2009, 39 (16): : 1347 - 1376
  • [42] TDAG: A Tunable Distributed Data Processing Model for Data Stream
    Tang, Jintao
    Lin, Xuelian
    Shen, Yang
    Wo, Tianyu
    2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 433 - 437
  • [43] Load Adaptive Distributed Stream Processing System for Explosive Stream Data
    Lee, Myungcheol
    Lee, Miyoung
    Hur, Sung Jin
    Kim, Ikkyun
    2015 17TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2015, : 753 - 757
  • [44] ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems
    Lian, Jinqing
    Zhang, Xinyi
    Shao, Yingxia
    Pu, Zenglin
    Xiang, Qingfeng
    Li, Yawen
    Cui, Bin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (13): : 4282 - 4295
  • [45] AutomataZoo: A Modern Automata Processing Benchmark Suite
    Wadden, Jack
    Tracy, Tommy, II
    Sadredini, Elaheh
    Wu, Lingxi
    Bo, Chunkun
    Du, Jesse
    Wei, Yizhou
    Udall, Jeffrey
    Wallace, Matthew
    Stan, Mircea
    Skadron, Kevin
    2018 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2018, : 13 - 24
  • [46] DIBS: A Data Integration Benchmark Suite
    Cabrera, Anthony M.
    Faber, Clayton J.
    Cepeda, Kyle
    Derber, Robert
    Epstein, Cooper
    Zheng, Jason
    Cytron, Ron K.
    Chamberlain, Roger D.
    COMPANION OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 25 - 28
  • [47] NpBench: A benchmark suite for control plane and data plane applications for network processors
    Lee, BK
    John, LK
    21ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, PROCEEDINGS, 2003, : 226 - 233
  • [48] An algorithm benchmark data suite for chemical and biological (chem/bio) defense applications
    Slamani, Mohamed-Adel
    Fisk, Brian
    Chyba, Thomas
    Emge, Darren
    Waugh, Steve
    SIGNAL AND DATA PROCESSING OF SMALL TARGETS 2008, 2008, 6969
  • [49] Fast Prototyping of Distributed Stream Processing Applications with stream2gym
    Ifath, Md. Monzurul Amin
    Neves, Miguel
    Haque, Israat
    2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 395 - 405
  • [50] VTDL: A Notation for Data Stream Processing Applications
    Hochreiner, Christoph
    Schulte, Stefan
    Dustdar, Schahram
    Nardelli, Matteo
    Knasmueller, Bernhard
    12TH IEEE SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2018) / 9TH INTERNATIONAL WORKSHOP ON JOINT CLOUD COMPUTING (JCC 2018), 2018, : 76 - 85