A Study of Resilient Distributed Datasets for Big Data System

被引:0
|
作者
Kim, Da-yeon [1 ]
Shin, Dong-ryeol [1 ]
机构
[1] Sungkyunkwan Univ, Coll Informat & Commun Engn, Suwon, South Korea
关键词
Big data software platform; Hadoop ecosystem; Bigdata service;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present the Resilient Distributed Dataset (RDD) abstraction, on which the rest of the rest of the dissertation builds a general-purpose cluster computing stack. RDDs extend the data flow programming model introduced by MapReduce, which is the most widely used model for large-scale data analysis today. we propose a new abstraction called resilient distributed datasets that gives users direct control of data sharing. RDDs are fault-tolerant, parallel data structures that let users explicitly store data on disk or in memory, control its partitioning, and manipulate it using a rich set of operators. They offer a simple and efficient programming interface that can capture both current specialized models and new applications.
引用
收藏
页码:290 / 293
页数:4
相关论文
共 50 条
  • [1] DBSCAN on Resilient Distributed Datasets
    Cordova, Irving
    Moh, Teng-Sheng
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), 2015, : 531 - 540
  • [2] Resilient Distributed Computing Platforms for Big Data Analysis Using Spark and Hadoop
    Chang, Bao Rong
    Tsai, Hsiu-Fen
    Wang, Yo-Ai
    Huang, Chien-Feng
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON APPLIED SYSTEM INNOVATION (ICASI), 2016,
  • [3] An Optimized Distributed OLAP System for Big Data
    Chen, Wenhao
    Wang, Haoxiang
    Zhang, Xingming
    Lin, Qidi
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA), 2017, : 36 - 40
  • [4] A Density-Grid Based Clustering Algorithm on Data Stream Using Resilient Distributed Datasets
    Zhang, Yuan
    Zhang, Jiongmin
    ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2016, 2016, 9673 : 316 - 322
  • [5] Hadoop Distributed File System for Big data analysis
    Almansouri, Hatim Talal
    Masmoudi, Youssef
    PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19), 2019, : 257 - 261
  • [6] Learning and Data Selection in Big Datasets
    Ghadikolaei, Hossein S.
    Ghauch, Hadi
    Fischione, Carlo
    Skoglund, Mikael
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [7] A Mobility and Congestion Resilient Data Management System for Distributed Mobile Networks
    Li, Ze
    Shen, Haiying
    2009 IEEE 6TH INTERNATIONAL CONFERENCE ON MOBILE ADHOC AND SENSOR SYSTEMS (MASS 2009), 2009, : 856 - 865
  • [8] Predictive Spatio-Temporal Query Processor on Resilient Distributed Datasets
    Akkineni, Vijay
    Aydin, Berkay
    Naduvil-Vadukootu, Sajitha
    Angryk, Rafal
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 50 - 58
  • [9] A Parallel Version of Differential Evolution Based on Resilient Distributed Datasets Model
    Deng, Changshou
    Tan, Xujie
    Dong, Xiaogang
    Tan, Yucheng
    BIO-INSPIRED COMPUTING - THEORIES AND APPLICATIONS, BIC-TA 2015, 2015, 562 : 84 - 93
  • [10] Resilient Blocks for Summarising Distributed Data
    Audrito, Giorgio
    Bergamini, Sergio
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2018, (264): : 23 - 26