epiC: an extensible and scalable system for processing Big Data

被引:0
|
作者
Dawei Jiang
Sai Wu
Gang Chen
Beng Chin Ooi
Kian-Lee Tan
Jun Xu
机构
[1] National University of Singapore,School of Computing
[2] Zhejiang University,College of Computer Science and Technology
[3] Harbin Institute of Technology,School of Computer Science and Technology
来源
The VLDB Journal | 2016年 / 25卷
关键词
Parallel processing; MapReduce; Pregel; Hadoop;
D O I
暂无
中图分类号
学科分类号
摘要
The Big Data problem is characterized by the so-called 3V features: volume—a huge amount of data, velocity—a high data ingestion rate, and variety—a mix of structured data, semi-structured data, and unstructured data. The state-of-the-art solutions to the Big Data problem are largely based on the MapReduce framework (aka its open source implementation Hadoop). Although Hadoop handles the data volume challenge successfully, it does not deal with the data variety well since the programming interfaces and its associated data processing model are inconvenient and inefficient for handling structured data and graph data. This paper presents epiC, an extensible system to tackle the Big Data’s data variety challenge. epiC introduces a general Actor-like concurrent programming model, independent of the data processing models, for specifying parallel computations. Users process multi-structured datasets with appropriate epiC extensions, and the implementation of a data processing model best suited for the data type and auxiliary code for mapping that data processing model into epiC’s concurrent programming model. Like Hadoop, programs written in this way can be automatically parallelized and the runtime system takes care of fault tolerance and inter-machine communications. We present the design and implementation of epiC’s concurrent programming model. We also present two customized data processing models, an optimized MapReduce extension and a relational model, on top of epiC. We show how users can leverage epiC to process heterogeneous data by linking different types of operators together. To improve the performance of complex analytic jobs, epiC supports a partition-based optimization technique where data are streamed between the operators to avoid the high I/O overheads. Experiments demonstrate the effectiveness and efficiency of our proposed epiC.
引用
收藏
页码:3 / 26
页数:23
相关论文
共 50 条
  • [21] CodHoop: A System for Optimizing Big Data Processing
    Asad, Zakia
    Chaudhry, Mohammad Asad Rehman
    Malone, David
    2015 9TH ANNUAL IEEE INTERNATIONAL SYSTEMS CONFERENCE (SYSCON), 2015, : 295 - 300
  • [22] Towards an Optimized Big Data Processing System
    Ghit, Bogdan
    Iosup, Alexandru
    Epema, Dick
    PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 83 - 86
  • [23] Scalable Mining of Big Data
    Leung, Carson K.
    Pazdor, Adam G. M.
    Zheng, Hao
    2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 240 - 247
  • [24] Kylin: An Efficient and Scalable Graph Data Processing System
    Ho, Li-Yung
    Li, Tsung-Han
    Wu, Jan-Jan
    Liu, Pangfeng
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [25] PCJ - Java']Java Library for Highly Scalable HPC and Big Data Processing
    Nowicki, Marek
    Gorski, Lukasz
    Bala, Piotr
    PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2018, : 12 - 20
  • [26] Scalable and Dynamic Big Data Processing and Service Provision in Edge Cloud Environments
    Ko, In-Young
    Srivastava, Abhishek
    Mrissa, Michael
    JOURNAL OF WEB ENGINEERING, 2022, 21 (01): : V - IX
  • [27] CloudProteoAnalyzer: scalable processing of big data from proteomics using cloud computing
    Li, Jiancheng
    Xiong, Yi
    Feng, Shichao
    Pan, Chongle
    Guo, Xuan
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [28] DISTRIBUTED SECONDO: an extensible and scalable database management system
    Nidzwetzki, Jan Kristof
    Guting, Ralf Hartmut
    DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (3-4) : 197 - 248
  • [29] Distributed secondo: an extensible and scalable database management system
    Jan Kristof Nidzwetzki
    Ralf Hartmut Güting
    Distributed and Parallel Databases, 2017, 35 : 197 - 248
  • [30] The Architectural Pattern of a Highly Extensible System for the Asynchronous Processing of a Large Amount of Data
    Hwang, Ro Man
    Kim, Soo Kyun
    An, Syungog
    Park, Dong-Won
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2013, 9 (04): : 567 - 574