Real-time stream processing for Big Data

被引:32
|
作者
Wingerath, Wolfram [1 ]
Gessert, Felix [1 ]
Friedrich, Steffen [1 ]
Ritter, Norbert [1 ]
机构
[1] Univ Hamburg, CS Dept, D-22527 Hamburg, Germany
来源
IT-INFORMATION TECHNOLOGY | 2016年 / 58卷 / 04期
关键词
Distributed real-time stream processing; Big Data analytics;
D O I
10.1515/itit-2016-0002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rise of the web 2.0 and the Internet of things, it has become feasible to track all kinds of information over time, in particular fine-grained user activities and sensor data on their environment and even their biometrics. However, while efficiency remains mandatory for any application trying to cope with huge amounts of data, only part of the potential of today's Big Data repositories can be exploited using traditional batch-oriented approaches as the value of data often decays quickly and high latency becomes unacceptable in some applications. In the last couple of years, several distributed data processing systems have emerged that deviate from the batch-oriented approach and tackle data items as they arrive, thus acknowledging the growing importance of timeliness and velocity in Big Data analytics. In this article, we give an overview over the state of the art of stream processors for low-latency Big Data analytics and conduct a qualitative comparison of the most popular contenders, namely Storm and its abstraction layer Trident, Samza and Spark Streaming. We describe their respective underlying rationales, the guarantees they provide and discuss the trade-offs that come with selecting one of them for a particular task.
引用
收藏
页码:186 / 194
页数:9
相关论文
共 50 条
  • [1] A survey on data stream, big data and real-time
    Gomes E.H.A.
    Plentz P.D.M.
    De Rolt C.R.
    Dantas M.A.R.
    [J]. International Journal of Networking and Virtual Organisations, 2019, 20 (02) : 143 - 167
  • [2] A review on big data real-time stream processing and its scheduling techniques
    Tantalaki, Nicoleta
    Souravlas, Stavros
    Roumeliotis, Manos
    [J]. INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2020, 35 (05) : 571 - 601
  • [3] Near Real-Time Big Data Stream Processing Platform Using Cassandra
    Pal, Gautam
    Li, Gangmin
    Atkinson, Katie
    [J]. 2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [4] Research on Real-time Processing and Stream Analysis of Unstructured Data Based on Big Data Platforms
    Liang, Huichao
    Wang, Di
    Liu, Yuan
    Mei, Lin
    Zhou, Mengxue
    Zhao, Haibin
    [J]. PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 96 - 101
  • [5] Real-time processing of streaming big data
    Safaei, Ali A.
    [J]. REAL-TIME SYSTEMS, 2017, 53 (01) : 1 - 44
  • [6] Real-time processing of streaming big data
    Ali A. Safaei
    [J]. Real-Time Systems, 2017, 53 : 1 - 44
  • [7] Challenges and Solutions for Processing Real-Time Big Data Stream: A Systematic Literature Review
    Mehmood, Erum
    Anees, Tayyaba
    [J]. IEEE ACCESS, 2020, 8 (08): : 119123 - 119143
  • [8] Real-Time Data Stream Partitioning over a Sliding Window in Real-Time Spatial Big Data
    Hamdi, Sana
    Bouazizi, Emna
    Faiz, Sami
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2018, PT I, 2018, 11334 : 75 - 88
  • [9] Big Data Stream Computing in Healthcare Real-Time Analytics
    Ta, Van-Dai
    Liu, Chuan-Ming
    Nkabinde, Goodwill Wandile
    [J]. PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 37 - 42
  • [10] Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem
    M. Mazhar Rathore
    Hojae Son
    Awais Ahmad
    Anand Paul
    Gwanggil Jeon
    [J]. International Journal of Parallel Programming, 2018, 46 : 630 - 646