Data Parallelism for Distributed Streaming Applications

被引:0
|
作者
Shinde, Bhagyashali [1 ]
Singh, S. T. [1 ]
机构
[1] Savitribai Phule Pune Univ, Dept Comp Engn, PK Tech Campus, Pune, Maharashtra, India
关键词
Data Processing; Distributed Computing; Parallel Programming;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Streaming applications can analyze vast data streams and requires both high throughput and low latency. They are comprised of operator graphs which produce and consume data tuples where operators are stateful, selective and user-defined. The streaming programming model logically exposes task and pipeline parallelism, enabling it to develop parallel systems. Naturally it doesnot expose data parallelism, which must be extracted from streaming applications. This paper presents a compiler and runtime system that automatically extract data parallelism for distributed stream processing. Our approach is safety guarantee in presence of stateful, selective and user-defined operators. Data parallelization is secure if the sequential semantics of the applications are preserved, also the compiler ensures safety by considering dependencies on other operators in the graph and selectivity, state, partitioning of operator. The distributed runtime system ensures that tuples always exit parallel regions in the same order they would without data parallelism, using the most efficient strategy as identified by the compiler.
引用
下载
收藏
页数:4
相关论文
共 50 条
  • [41] Fiesta: Parallelism for Data Collection and Intelligent Inference in a Distributed Heterogeneous Environment
    Desai, Purvi
    Panse, Akanksha
    Jadhav, Manali
    Gavhane, Ashwini
    Patwardhan, Aniruddha
    UKSIM FIFTH EUROPEAN MODELLING SYMPOSIUM ON COMPUTER MODELLING AND SIMULATION (EMS 2011), 2011, : 237 - 240
  • [42] Advanced Commands and Distributed Data Layout to Enhance the SSD Internal Parallelism
    Zertal, Soraya
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), 2015, : 143 - 150
  • [43] Exploiting Different Types of Parallelism in Distributed Analysis of Remote Sensing Data
    Costa, Gilson A. O. P.
    Bentes, Cristiana
    Ferreira, Rodrigo S.
    Feitosa, Raul Q.
    Oliveira, Dario A. B.
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (08) : 1298 - 1302
  • [44] DATALYZER: Streaming Data Applications Made Easy
    Gonzalez-Jimenez, Mario
    de Lara, Juan
    WEB ENGINEERING, ICWE 2018, 2018, 10845 : 420 - 429
  • [45] Using machine learning to optimize parallelism in big data applications
    Brandon Hernandez, Alvaro
    Perez, Maria S.
    Gupta, Smrati
    Muntes-Mulero, Victor
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 : 1076 - 1092
  • [46] Systematic exploitation of data parallelism in hardware synthesis of DSP applications
    Sen, M
    Bhattacharyya, SS
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 229 - 232
  • [47] Euphrates: a system for automatic introduction of data parallelism into modular applications
    Wilson, Andrew J.S.
    Flockhart, Ian W.
    Computer Graphics (ACM), 1995, 29 (02): : 37 - 40
  • [48] MEASURE OF PARALLELISM OF DISTRIBUTED COMPUTATIONS
    CHARRONBOST, B
    LECTURE NOTES IN COMPUTER SCIENCE, 1989, 349 : 434 - 445
  • [49] Streaming Big Data meets Backpressure in Distributed Network Computation
    Destounis, Apostolos
    Paschos, Georgios S.
    Koutsopoulos, Iordanis
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [50] Privately detecting bursts in streaming, distributed time series data
    Singh, Lisa
    Sayal, Mehmet
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (06) : 509 - 530