Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming

被引:187
|
作者
Chintapalli, Sanket [1 ]
Dagit, Derek [1 ]
Evans, Bobby [1 ]
Farivar, Reza [1 ]
Graves, Thomas [1 ]
Holderbaugh, Mark [1 ]
Liu, Zhuo [1 ]
Nusbaum, Kyle [1 ]
Patil, Kishorkumar [1 ]
Peng, Boyang Jerry [1 ]
Poulosky, Paul [1 ]
机构
[1] Yahoo Inc, Sunnyvale, CA 94089 USA
关键词
Streaming processing; Benchmark; Storm; Spark; Flink; Low Latency;
D O I
10.1109/IPDPSW.2016.138
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Streaming data processing has been gaining attention due to its application into a wide range of scenarios. To serve the booming demands of streaming data processing, many computation engines have been developed. However, there is still a lack of real-world benchmarks that would be helpful when choosing the most appropriate platform for serving real-time streaming needs. In order to address this problem, we developed a streaming benchmark for three representative computation engines: Flink, Storm and Spark Streaming. Instead of testing speed-of-light event processing, we construct a full data pipeline using Kafka and Redis in order to more closely mimic the real-world production scenarios. Based on our experiments, we provide a performance comparison of the three data engines in terms of 99th percentile latency and throughput for various configurations.
引用
收藏
页码:1789 / 1792
页数:4
相关论文
共 50 条
  • [41] StreamAligner: a streaming based sequence aligner on Apache Spark
    Rathee S.
    Kashyap A.
    Journal of Big Data, 5 (1)
  • [42] Trending Pattern Analysis of Twitter Using Spark Streaming
    Garg, Prachi
    Johari, Rahul
    Kumar, Hemang
    Bhatia, Riya
    APPLICATIONS OF COMPUTING AND COMMUNICATION TECHNOLOGIES, ICACCT 2018, 2018, 899 : 3 - 13
  • [43] Data Driven Priority Scheduling on a Spark Streaming System
    Ajila, Tobi
    Majumdar, Shikharesh
    2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 561 - 568
  • [44] Performance-sensitive components exploration in Spark Streaming
    Hou, Ying
    Liang, Yi
    Su, Chao
    PROCEEDINGS OF THE ADVANCES IN MATERIALS, MACHINERY, ELECTRICAL ENGINEERING (AMMEE 2017), 2017, 114 : 390 - 394
  • [45] A configurable and executable model of Spark Streaming on Apache YARN
    Lin, Jia-Chun
    Lee, Ming-Chang
    Yu, Ingrid Chieh
    Johnsen, Einar Broch
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2020, 11 (02) : 185 - 195
  • [46] Sensor data collection and analytics with ThingsBoard and Spark Streaming
    De Paolis, Lucio Tommaso
    De Luca, Valerio
    Paiano, Roberto
    2018 IEEE WORKSHOP ON ENVIRONMENTAL, ENERGY, AND STRUCTURAL MONITORING SYSTEMS (EESMS), 2018, : 59 - 64
  • [48] Dynamically Scaling Apache Storm for the Analysis of Streaming Data
    van der Veen, Jan Sipke
    van der Waaij, Bram
    Lazovik, Elena
    Wijbrandi, Wilco
    Meijer, Robert J.
    2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015), 2015, : 154 - 161
  • [49] Spark-Tuner: An Elastic Auto-Tuner for Apache Spark Streaming
    HoseinyFarahabady, M. Reza
    Taheri, Javid
    Zomaya, Albert Y.
    Tari, Zahir
    2020 IEEE 13TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2020), 2020, : 544 - 548
  • [50] Benchmarking Scalable Methods for Streaming Cross Document Entity Coreference
    Logan, Robert L.
    McCallum, Andrew
    Singh, Sameer
    Bikel, Daniel
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4717 - 4731