epiC: an extensible and scalable system for processing Big Data

被引:0
|
作者
Dawei Jiang
Sai Wu
Gang Chen
Beng Chin Ooi
Kian-Lee Tan
Jun Xu
机构
[1] National University of Singapore,School of Computing
[2] Zhejiang University,College of Computer Science and Technology
[3] Harbin Institute of Technology,School of Computer Science and Technology
来源
The VLDB Journal | 2016年 / 25卷
关键词
Parallel processing; MapReduce; Pregel; Hadoop;
D O I
暂无
中图分类号
学科分类号
摘要
The Big Data problem is characterized by the so-called 3V features: volume—a huge amount of data, velocity—a high data ingestion rate, and variety—a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on the MapReduce framework (aka its open source implementation Hadoop). Although Hadoop handles the data volume challenge successfully, it does not deal with the data variety well since the programming interfaces and its associated data processing model are inconvenient and inefficient for handling structured data and graph data. This paper presents epiC, an extensible system to tackle the Big Data’s data variety challenge. epiC introduces a general Actor-like concurrent programming model, independent of the data processing models, for specifying parallel computations. Users process multi-structured datasets with appropriate epiC extensions, and the implementation of a data processing model best suited for the data type and auxiliary code for mapping that data processing model into epiC’s concurrent programming model. Like Hadoop, programs written in this way can be automatically parallelized and the runtime system takes care of fault tolerance and inter-machine communications. We present the design and implementation of epiC’s concurrent programming model. We also present two customized data processing models, an optimized MapReduce extension and a relational model, on top of epiC. We show how users can leverage epiC to process heterogeneous data by linking different types of operators together. To improve the performance of complex analytic jobs, epiC supports a partition-based optimization technique where data are streamed between the operators to avoid the high I/O overheads. Experiments demonstrate the effectiveness and efficiency of our proposed epiC.
引用
收藏
页码:3 / 26
页数:23
相关论文
共 50 条
  • [41] An Extensible Parsing Pipeline for Unstructured Data Processing
    Jain, Shubham
    de Buitleir, Amy
    Fallon, Enda
    2022 24TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ARITIFLCIAL INTELLIGENCE TECHNOLOGIES TOWARD CYBERSECURITY, 2022, : 312 - +
  • [42] DISTIL: Design and Implementation of a Scalable Synchrophasor Data Processing System
    Andersen, Michael P.
    Kumar, Sam
    Brooks, Connor
    von Meier, Alexandra
    Culler, David L.
    2015 IEEE INTERNATIONAL CONFERENCE ON SMART GRID COMMUNICATIONS (SMARTGRIDCOMM), 2015, : 271 - 277
  • [43] Scalable Euclidean Embedding for Big Data
    Alavi, Zohreh
    Sharma, Sagar
    Zhou, Lu
    Chen, Keke
    2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 773 - 780
  • [44] A Scalable Big Data Test Framework
    Li, Nan
    Escalona, Anthony
    Guo, Yun
    Offutt, Jeff
    2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST), 2015,
  • [45] Clouds for Scalable Big Data Analytics
    Talia, Domenico
    COMPUTER, 2013, 46 (05) : 98 - 101
  • [46] DynaDojo: An Extensible Benchmarking Platform for Scalable Dynamical System Identification
    Bhamidipaty, Logan Mondal
    Bruzzese, Tommy
    Tran, Caryn
    Mrad, Rami
    Kanwal, Max
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [47] BGPmon: A real-time, scalable, extensible monitoring system
    Yan, He
    Oliveira, Ricardo
    Burnett, Kevin
    Matthews, Dave
    Zhang, Lixia
    Massey, Dan
    CATCH 2009: CYBERSECURITY APPLICATIONS AND TECHNOLOGY CONFERENCE FOR HOMELAND SECURITY, PROCEEDINGS, 2009, : 212 - +
  • [48] Designing Distributed, Scalable and Extensible System using Reactive Architectures
    Tovarnitchi, Vasile M.
    2019 22ND INTERNATIONAL CONFERENCE ON CONTROL SYSTEMS AND COMPUTER SCIENCE (CSCS), 2019, : 484 - 488
  • [49] Scalable Transformation of Big Geospatial Data into Linked Data
    Mandilaras, George
    Koubarakis, Manolis
    SEMANTIC WEB - ISWC 2021, 2021, 12922 : 480 - 495
  • [50] SOVAS: a scalable online visual analytic system for big climate data analysis
    Li, Zhenlong
    Huang, Qunying
    Jiang, Yuqin
    Hu, Fei
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2020, 34 (06) : 1188 - 1209