Composable and Efficient Functional Big Data Processing Framework

被引:0
|
作者
Wu, Dongyao [1 ,2 ]
Sakr, Sherif [1 ,3 ]
Zhu, Liming [1 ,2 ]
Lu, Qinghua [1 ,4 ]
机构
[1] NICTA, Software Syst Res Grp, Sydney, NSW, Australia
[2] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
[3] King Saud bin Abdulaziz Univ Hlth Sci, Riyadh, Saudi Arabia
[4] China Univ Petr, Coll Comp & Commun Engn, Qingdao, Peoples R China
关键词
big data processing; parallel programming; functional programming; distributed systems; system architecture;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the past years, frameworks such as MapReduce and Spark have been introduced to ease the task of developing big data programs and applications. However, the jobs in these frameworks are roughly defined and packaged as executable jars without any functionality being exposed or described. This means that deployed jobs are not natively composable and reusable for subsequent development. Besides, it also hampers the ability for applying optimizations on the data flow of job sequences and pipelines. In this paper, we present the Hierarchically Distributed Data Matrix (HDM) which is a functional, strongly-typed data representation for writing composable big data applications. Along with HDM, a runtime framework is provided to support the execution of HDM applications on distributed infrastructures. Based on the functional data dependency graph of HDM, multiple optimizations are applied to improve the performance of executing HDM jobs. The experimental results show that our optimizations can achieve improvements of between 10% to 60% of the Job-Completion-Time for different types of operation sequences when compared with the current state of art, Apache Spark.
引用
收藏
页码:279 / 286
页数:8
相关论文
共 50 条
  • [31] Efficient query processing platform for uncertain big data
    Huang, Zhenhua
    Zhang, Jiawen
    Fang, Qiang
    International Journal of Database Theory and Application, 2015, 8 (05): : 149 - 160
  • [32] Fast and Efficient In-Memory Big Data Processing
    Malik, Babur Hayat
    Maryam, Maliha
    Khalid, Myda
    Khlaid, Javaria
    Rehman, Naj Am Ur
    Sajjad, Syeda Iqra
    Islam, Tanveer
    Butt, Umair Ahmed
    Raza, Ali
    Nasr, M. Saad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (05) : 517 - 524
  • [33] Reducing IoT Big Data for Efficient Storage and Processing
    Katsarou, Eleftheria
    Hadjiefthymiades, Stathes
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY, IOTBDS 2023, 2023, : 226 - 230
  • [34] Biscuit: A Framework for Near-Data Processing of Big Data Workloads
    Gu, Boncheol
    Yoon, Andre S.
    Bae, Duck-Ho
    Jo, Insoon
    Lee, Jinyoung
    Yoon, Jonghyun
    Kang, Jeong-Uk
    Kwon, Moonsang
    Yoon, Chanho
    Cho, Sangyeun
    Jeong, Jaeheon
    Chang, Duckhyun
    2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 153 - 165
  • [35] Scalable processing and autocovariance computation of big functional data
    Brisaboa, Nieves R.
    Cao, Ricardo
    Parama, Jose R.
    Silva-Coira, Fernando
    SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (01): : 123 - 140
  • [36] Spatial-Crowd: A Big Data Framework for Efficient Data Visualization
    Atta, Shahbaz
    Sadiq, Bilal
    Ahmad, Akhlaq
    Saeed, Sheikh Nasir
    Felemban, Emad
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2130 - 2138
  • [37] An Enhanced Pre-Processing Model for Big Data Processing: A Quality Framework
    Lincy, Blessy Trencia S. S.
    Kumar, N. Suresh
    2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN GREEN ENERGY AND HEALTHCARE TECHNOLOGIES (IGEHT), 2017,
  • [38] AROM: Processing Big Data With Data Flow Graphs and Functional Programming
    Nam-Luc Tran
    Skhiri, Sabri
    Lesuisse, Arthur
    Zimanyi, Esteban
    2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2012,
  • [39] Secure big data collection and processing: Framework, means and opportunities
    Zhang, Li-Chun
    Haraldsen, Gustav
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2022, 185 (04) : 1541 - 1559
  • [40] Framework for Modeling Security Policies of Big Data Processing Systems
    M. A. Poltavtseva
    D. V. Ivanov
    E. V. Zavadskii
    Automatic Control and Computer Sciences, 2023, 57 : 1063 - 1070