DSPBench: A Suite of Benchmark Applications for Distributed Data Stream Processing Systems

被引:19
|
作者
Bordin, Maycon Viana [1 ]
Griebler, Dalvan [2 ,3 ]
Mencagli, Gabriele [4 ]
Geyer, Claudio F. R. [1 ]
Fernandes, Luiz Gustavo L. [2 ]
机构
[1] Fed Univ Rio Grande Sul UFRGS, Inst Informat, BR-91509900 Porto Alegre, RS, Brazil
[2] Pontifcal Catholic Univ Rio Grande Sul PUCRS, Sch Technol, BR-90619900 Porto Alegre, RS, Brazil
[3] SETREM, Tres De Maio Fac, Lab Adv Res Cloud Comp LARCC, BR-98910000 Tres De Maio, Brazil
[4] Univ Pisa, Dept Comp Sci, I-56127 Pisa, Italy
基金
欧盟地平线“2020”;
关键词
Benchmark testing; Storms; Sparks; Task analysis; Throughput; Distributed databases; Tools; Data stream processing; big data; benchmarking; apache storm; spark streaming; LANGUAGE; MODEL;
D O I
10.1109/ACCESS.2020.3043948
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Systems enabling the continuous processing of large data streams have recently attracted the attention of the scientific community and industrial stakeholders. Data Stream Processing Systems (DSPSs) are complex and powerful frameworks able to ease the development of streaming applications in distributed computing environments like clusters and clouds. Several systems of this kind have been released and currently maintained as open source projects, like Apache Storm and Spark Streaming. Some benchmark applications have often been used by the scientific community to test and evaluate new techniques to improve the performance and usability of DSPSs. However, the existing benchmark suites lack of representative workloads coming from the wide set of application domains that can leverage the benefits offered by the stream processing paradigm in terms of near real-time performance. The goal of this article is to present a new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunications, Sensor Networks, Social Networks and others. This article describes in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation. In addition, it exemplifies the usefulness of our benchmark suite to compare real DSPSs by selecting Apache Storm and Spark Streaming for this analysis.
引用
收藏
页码:222900 / 222917
页数:18
相关论文
共 50 条
  • [21] A Portable Benchmark Suite for Highly Parallel Data Intensive Query Processing
    Saeed, Ifrah
    Young, Jeffrey
    Yalamanchili, Sudhakar
    2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 31 - 38
  • [22] Modeling Data Stream Intensity in Distributed Stream Processing System
    Gorawski, Marcin
    Marks, Pawel
    Gorawski, Michal
    COMPUTER NETWORKS, CN 2013, 2013, 370 : 372 - 383
  • [23] An Efficient Approach for Storage of Big Data Streams in Distributed Stream Processing Systems
    Alshamrani, Sultan
    Waseem, Quadri
    Alharbi, Abdullah
    Alosaimi, Wael
    Turabieh, Hamza
    Alyami, Hashem
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 91 - 98
  • [24] Optimizing distributed data stream processing by tracing
    Zvara, Zoltan
    Szabo, Peter G. N.
    Balazs, Barnabas
    Benczur, Andras
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 578 - 591
  • [25] A Survey of Distributed Data Stream Processing Frameworks
    Isah, Haruna
    Abughofa, Tariq
    Mahfuz, Sazia
    Ajerla, Dharmitha
    Zulkernine, Farhana
    Khan, Shahzad
    IEEE ACCESS, 2019, 7 : 154300 - 154316
  • [26] Distributed Multilevel Secure Data Stream Processing
    Xie, Xing
    Ray, Indrakshi
    Ranasinghe, Waruna
    Gilbert, Philips A.
    Shashidhara, Pramod
    Yadav, Anoop
    2013 33RD IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2013), 2013, : 368 - 373
  • [27] A Prediction Framework for Distributed Data Stream Processing
    He ZhiYong
    Du RongHua
    PROCEEDINGS OF THE 2009 PACIFIC-ASIA CONFERENCE ON CIRCUITS, COMMUNICATIONS AND SYSTEM, 2009, : 179 - 183
  • [28] On Data Stream Processing in IoT Applications
    Namiot, Dmitry
    Sneps-Sneppe, Manfred
    Pauliks, Romass
    INTERNET OF THINGS, SMART SPACES, AND NEXT GENERATION NETWORKS AND SYSTEMS, NEW2AN 2018, 2018, 11118 : 41 - 51
  • [29] Distributed resource allocation for stream data processing
    Tang, Ao
    Liu, Zhen
    Xia, Cathy
    Zhang, Li
    HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2006, 4208 : 91 - 100
  • [30] Accommodating Bursts in Distributed Stream Processing Systems
    Drougas, Yannis
    Kalogeraki, Vana
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 362 - 372