epiC: an extensible and scalable system for processing Big Data

被引:0
|
作者
Dawei Jiang
Sai Wu
Gang Chen
Beng Chin Ooi
Kian-Lee Tan
Jun Xu
机构
[1] National University of Singapore,School of Computing
[2] Zhejiang University,College of Computer Science and Technology
[3] Harbin Institute of Technology,School of Computer Science and Technology
来源
The VLDB Journal | 2016年 / 25卷
关键词
Parallel processing; MapReduce; Pregel; Hadoop;
D O I
暂无
中图分类号
学科分类号
摘要
The Big Data problem is characterized by the so-called 3V features: volume—a huge amount of data, velocity—a high data ingestion rate, and variety—a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on the MapReduce framework (aka its open source implementation Hadoop). Although Hadoop handles the data volume challenge successfully, it does not deal with the data variety well since the programming interfaces and its associated data processing model are inconvenient and inefficient for handling structured data and graph data. This paper presents epiC, an extensible system to tackle the Big Data’s data variety challenge. epiC introduces a general Actor-like concurrent programming model, independent of the data processing models, for specifying parallel computations. Users process multi-structured datasets with appropriate epiC extensions, and the implementation of a data processing model best suited for the data type and auxiliary code for mapping that data processing model into epiC’s concurrent programming model. Like Hadoop, programs written in this way can be automatically parallelized and the runtime system takes care of fault tolerance and inter-machine communications. We present the design and implementation of epiC’s concurrent programming model. We also present two customized data processing models, an optimized MapReduce extension and a relational model, on top of epiC. We show how users can leverage epiC to process heterogeneous data by linking different types of operators together. To improve the performance of complex analytic jobs, epiC supports a partition-based optimization technique where data are streamed between the operators to avoid the high I/O overheads. Experiments demonstrate the effectiveness and efficiency of our proposed epiC.
引用
收藏
页码:3 / 26
页数:23
相关论文
共 50 条
  • [31] A Scalable Evolutionary Linguistic Fuzzy System with Adaptive Defuzzification in Big Data
    Marquez, A. A.
    Marquez, F. A.
    Peregrin, A.
    2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,
  • [32] A scalable and flexible basket analysis system for big transaction data in Spark
    Sun, Xudong
    Ngueilbaye, Alladoumbaye
    Luo, Kaijing
    Cai, Yongda
    Wu, Dingming
    Huang, Joshua Zhexue
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (02)
  • [33] Big medical data processing system based on hadoop
    Liu, W.
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 181 - 181
  • [34] Big Data Processing System for Analysis of GitHub Events
    Voinov, Nikita
    Garzon, Katterine Rodriguez
    Nikiforov, Igor
    Drobintsev, Pavel
    PROCEEDINGS OF 2019 XXII INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MEASUREMENTS (SCM), 2019, : 187 - 190
  • [35] The Video Monitoring System Based on Big Data Processing
    Zhou Lin
    Li Zhen
    Chen Yingmei
    Tan Yuqin
    2014 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION (ICICTA), 2014, : 865 - 868
  • [36] An Extensible Parsing Pipeline for Unstructured Data Processing
    Jain, Shubham
    de Buitleir, Amy
    Fallon, Enda
    2021 23RD INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT 2021): ON-LINE SECURITY IN PANDEMIC ERA, 2021, : 312 - 318
  • [37] A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud
    Yang, Chi
    Chen, Jinjun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (06) : 1144 - 1157
  • [38] Novel Scalable Deep Learning Approaches for Big Data Analytics Applied to ECG Processing
    Mennour, Rostom
    Batouche, Mohamed
    INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING, 2018, 9 (04) : 33 - 51
  • [39] An Efficient and Scalable Framework for Processing Remotely Sensed Big Data in Cloud Computing Environments
    Sun, Jin
    Zhang, Yi
    Wu, Zebin
    Zhu, Yaoqin
    Yin, Xianliang
    Ding, Zhongzheng
    Wei, Zhihui
    Plaza, Javier
    Plaza, Antonio
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (07): : 4294 - 4308
  • [40] Extensible Query Framework for Unstructured Medical Data - A Big Data Approach
    Istephan, Sarmad
    Siadat, Mohammad-Reza
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 455 - 462