epiC: an extensible and scalable system for processing Big Data

被引:0
|
作者
Dawei Jiang
Sai Wu
Gang Chen
Beng Chin Ooi
Kian-Lee Tan
Jun Xu
机构
[1] National University of Singapore,School of Computing
[2] Zhejiang University,College of Computer Science and Technology
[3] Harbin Institute of Technology,School of Computer Science and Technology
来源
The VLDB Journal | 2016年 / 25卷
关键词
Parallel processing; MapReduce; Pregel; Hadoop;
D O I
暂无
中图分类号
学科分类号
摘要
The Big Data problem is characterized by the so-called 3V features: volume—a huge amount of data, velocity—a high data ingestion rate, and variety—a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on the MapReduce framework (aka its open source implementation Hadoop). Although Hadoop handles the data volume challenge successfully, it does not deal with the data variety well since the programming interfaces and its associated data processing model are inconvenient and inefficient for handling structured data and graph data. This paper presents epiC, an extensible system to tackle the Big Data’s data variety challenge. epiC introduces a general Actor-like concurrent programming model, independent of the data processing models, for specifying parallel computations. Users process multi-structured datasets with appropriate epiC extensions, and the implementation of a data processing model best suited for the data type and auxiliary code for mapping that data processing model into epiC’s concurrent programming model. Like Hadoop, programs written in this way can be automatically parallelized and the runtime system takes care of fault tolerance and inter-machine communications. We present the design and implementation of epiC’s concurrent programming model. We also present two customized data processing models, an optimized MapReduce extension and a relational model, on top of epiC. We show how users can leverage epiC to process heterogeneous data by linking different types of operators together. To improve the performance of complex analytic jobs, epiC supports a partition-based optimization technique where data are streamed between the operators to avoid the high I/O overheads. Experiments demonstrate the effectiveness and efficiency of our proposed epiC.
引用
收藏
页码:3 / 26
页数:23
相关论文
共 50 条
  • [1] epiC: an Extensible and Scalable System for Processing Big Data
    Jiang, Dawei
    Chen, Gang
    Ooi, Beng Chin
    Tan, Kian-Lee
    Wu, Sai
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (07): : 541 - 552
  • [2] epiC: an extensible and scalable system for processing Big Data
    Jiang, Dawei
    Wu, Sai
    Chen, Gang
    Ooi, Beng Chin
    Tan, Kian-Lee
    Xu, Jun
    [J]. VLDB JOURNAL, 2016, 25 (01): : 3 - 26
  • [3] Clouds for scalable Big Data processing
    Trunfio, Paolo
    Vlassov, Vladimir
    [J]. INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 629 - 631
  • [4] Runtime Composition for Extensible Big Data Processing Platforms
    Kimura, Kosaku
    Nomura, Yoshihide
    Tanaka, Yuka
    Kurihara, Hidetoshi
    Yamamoto, Rieko
    [J]. 2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 1053 - 1057
  • [5] Design of a Scalable Data Stream Channel for Big Data Processing
    Lee, Yong-Ju
    Lee, Myungcheol
    Lee, Mi-Young
    Hur, Sung Jin
    Min, Okgee
    [J]. 2015 17TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2015, : 556 - 559
  • [6] FENCE: Fast, ExteNsible, and ConsolidatEd Framework for Intelligent Big Data Processing
    Ramneek
    Cha, Seung-Jun
    Pack, Sangheon
    Jeon, Seung Hyub
    Jeong, Yeon Jeong
    Kim, Jin Mee
    Jung, Sungin
    [J]. IEEE ACCESS, 2020, 8 : 125423 - 125437
  • [7] Scalable processing and autocovariance computation of big functional data
    Brisaboa, Nieves R.
    Cao, Ricardo
    Parama, Jose R.
    Silva-Coira, Fernando
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (01): : 123 - 140
  • [8] A scalable and real-time system for disease prediction using big data processing
    Abderrahmane Ed-daoudy
    Khalil Maalmi
    Aziza El Ouaazizi
    [J]. Multimedia Tools and Applications, 2023, 82 : 30405 - 30434
  • [9] A scalable and real-time system for disease prediction using big data processing
    Ed-daoudy, Abderrahmane
    Maalmi, Khalil
    El Ouaazizi, Aziza
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (20) : 30405 - 30434
  • [10] Scalable system scheduling for HPC and big data
    Reuther, Albert
    Byun, Chansup
    Arcand, William
    Bestor, David
    Bergeron, Bill
    Hubbell, Matthew
    Jones, Michael
    Michaleas, Peter
    Prout, Andrew
    Rosa, Antonio
    Kepner, Jeremy
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 111 : 76 - 92