A two-level formal model for Big Data processing programs

被引：2

作者：

de Souza Neto, Joao Batista ^{[1
,2
]}

Moreira, Anamaria Martins ^{[3
]}

Vargas-Solar, Genoveva ^{[4
]}

Musicante, Martin A. ^{[1
]}

机构：

[1] Univ Fed Rio Grande do Norte, Dept Informat & Appl Math DIMAp, Natal, RN, Brazil

[2] Fed Ctr Technol Educ Minas Gerais, Dept Informat Management & Design DIGD DV, Divinopolis, Brazil

[3] Univ Fed Rio de Janeiro, Inst Comp IC, Rio De Janeiro, Brazil

[4] LIRIS, French Council Sci Res CNRS, Lyon, France

来源：

SCIENCE OF COMPUTER PROGRAMMING | 2022年 / 215卷

关键词：

Big Data processing; Data flow programming models; Petri nets; Monoid algebra; PETRI NETS;

D O I：

10.1016/j.scico.2021.102764

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This paper proposes a model for specifying data flow-based parallel data processing programs agnostic of target Big Data processing frameworks. The paper focuses on the formal abstract specification of non-iterative and iterative programs, generalizing the strategies adopted by data flow Big Data processing frameworks. The proposed model relies on Monoid Algebra and Petri Nets to abstract Big Data processing programs in two levels: a higher level representing the program data flow and a lower level representing data transformation operations (e.g., filtering, aggregation, join). We extend the model for data processing programs proposed in [1], for modeling iterative data processing programs. The general specification of these programs implemented by data flow-based parallel programming models is essential given the democratization of iterative and greedy Big Data analytics algorithms. Indeed, these algorithms call for revisiting parallel programming models to express iterations. The paper gives a comparative analysis of the iteration strategies proposed by Apache Spark, DryadLINQ, Apache Beam, and Apache Flink. It discusses how the model achieves to generalize these strategies. (c) 2021 Elsevier B.V. All rights reserved.

引用

页数：20

共 50 条

[1] A Two-Level Architecture for Data Warehousing and OLAP Over Big Data
Dhaouadi, Asma
Gammoudi, Mohamed Mohsen
Hammoudi, Slimane
[J]. VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 7182 - 7194
[2] A Two-Level Statistical Model for Big Mart Sales Prediction
Punam, Kumari
Pamula, Rajendra
Jain, Praphula Kumar
[J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTING, POWER AND COMMUNICATION TECHNOLOGIES (GUCON), 2018, : 617 - 620
[3] Two-Level Supply Chain Network Construction in Big Data Environment
Lei, Yu
Ma, Ruiyuan
Ye, Hongshu
Chen, Deng
[J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
[4] STLIS: A Scalable Two-Level Index Scheme for Big Data in IoT
Leng, Yonglin
Chen, Zhikui
Hu, Yueming
[J]. MOBILE INFORMATION SYSTEMS, 2016, 2016
[5] A two-level directional model for dependence in circular data
Holmquist, Bjorn
Gustafsson, Peter
[J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2017, 45 (04): : 461 - 478
[6] Two-level model based design of solids processing systems
Lakatos, BG
Auer, R
[J]. COMPUTERS & CHEMICAL ENGINEERING, 1998, 22 : S785 - S788
[7] Providing and evaluating a comprehensive model for detecting fraudulent electronic payment card transactions with a two-level filter based on flow processing in big data
Banirostam H.
Banirostam T.
Pedram M.M.
Rahmani A.M.
[J]. International Journal of Information Technology, 2023, 15 (8) : 4161 - 4166
[8] A Two-level Moderated Latent Variable Model with Single Level Data
Liu, Hongyun
Yuan, Ke-Hai
Liu, Fang
[J]. MULTIVARIATE BEHAVIORAL RESEARCH, 2020, 55 (06) : 873 - 893
[9] Two-level Data Prefetching
Gao, Fei
Cui, Hanyu
Sair, Suleyman
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, VOLS, 1 AND 2, 2007, : 238 - 244
[10] Accelerating big data analytics on HPC clusters using two-level storage
Xuan, Pengfei
Ligon, Walter B.
Srimani, Pradip K.
Ge, Rong
Luo, Feng
[J]. PARALLEL COMPUTING, 2017, 61 : 18 - 34

← 1 2 3 4 5 →