A two-level formal model for Big Data processing programs

被引:2
|
作者
de Souza Neto, Joao Batista [1 ,2 ]
Moreira, Anamaria Martins [3 ]
Vargas-Solar, Genoveva [4 ]
Musicante, Martin A. [1 ]
机构
[1] Univ Fed Rio Grande do Norte, Dept Informat & Appl Math DIMAp, Natal, RN, Brazil
[2] Fed Ctr Technol Educ Minas Gerais, Dept Informat Management & Design DIGD DV, Divinopolis, Brazil
[3] Univ Fed Rio de Janeiro, Inst Comp IC, Rio De Janeiro, Brazil
[4] LIRIS, French Council Sci Res CNRS, Lyon, France
关键词
Big Data processing; Data flow programming models; Petri nets; Monoid algebra; PETRI NETS;
D O I
10.1016/j.scico.2021.102764
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper proposes a model for specifying data flow-based parallel data processing programs agnostic of target Big Data processing frameworks. The paper focuses on the formal abstract specification of non-iterative and iterative programs, generalizing the strategies adopted by data flow Big Data processing frameworks. The proposed model relies on Monoid Algebra and Petri Nets to abstract Big Data processing programs in two levels: a higher level representing the program data flow and a lower level representing data transformation operations (e.g., filtering, aggregation, join). We extend the model for data processing programs proposed in [1], for modeling iterative data processing programs. The general specification of these programs implemented by data flow-based parallel programming models is essential given the democratization of iterative and greedy Big Data analytics algorithms. Indeed, these algorithms call for revisiting parallel programming models to express iterations. The paper gives a comparative analysis of the iteration strategies proposed by Apache Spark, DryadLINQ, Apache Beam, and Apache Flink. It discusses how the model achieves to generalize these strategies. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] A Two-Level Architecture for Data Warehousing and OLAP Over Big Data
    Dhaouadi, Asma
    Gammoudi, Mohamed Mohsen
    Hammoudi, Slimane
    [J]. VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 7182 - 7194
  • [2] A Two-Level Statistical Model for Big Mart Sales Prediction
    Punam, Kumari
    Pamula, Rajendra
    Jain, Praphula Kumar
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTING, POWER AND COMMUNICATION TECHNOLOGIES (GUCON), 2018, : 617 - 620
  • [3] Two-Level Supply Chain Network Construction in Big Data Environment
    Lei, Yu
    Ma, Ruiyuan
    Ye, Hongshu
    Chen, Deng
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [4] STLIS: A Scalable Two-Level Index Scheme for Big Data in IoT
    Leng, Yonglin
    Chen, Zhikui
    Hu, Yueming
    [J]. MOBILE INFORMATION SYSTEMS, 2016, 2016
  • [5] A two-level directional model for dependence in circular data
    Holmquist, Bjorn
    Gustafsson, Peter
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2017, 45 (04): : 461 - 478
  • [6] Two-level model based design of solids processing systems
    Lakatos, BG
    Auer, R
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 1998, 22 : S785 - S788
  • [7] Providing and evaluating a comprehensive model for detecting fraudulent electronic payment card transactions with a two-level filter based on flow processing in big data
    Banirostam H.
    Banirostam T.
    Pedram M.M.
    Rahmani A.M.
    [J]. International Journal of Information Technology, 2023, 15 (8) : 4161 - 4166
  • [8] A Two-level Moderated Latent Variable Model with Single Level Data
    Liu, Hongyun
    Yuan, Ke-Hai
    Liu, Fang
    [J]. MULTIVARIATE BEHAVIORAL RESEARCH, 2020, 55 (06) : 873 - 893
  • [9] Two-level Data Prefetching
    Gao, Fei
    Cui, Hanyu
    Sair, Suleyman
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, VOLS, 1 AND 2, 2007, : 238 - 244
  • [10] Accelerating big data analytics on HPC clusters using two-level storage
    Xuan, Pengfei
    Ligon, Walter B.
    Srimani, Pradip K.
    Ge, Rong
    Luo, Feng
    [J]. PARALLEL COMPUTING, 2017, 61 : 18 - 34