A NoSQL Data Model For Scalable Big Data Workflow Execution

被引:9
|
作者
Mohan, Aravind [1 ]
Ebrahimi, Mahdi [1 ]
Lu, Shiyong [1 ]
Kotov, Alexander [1 ]
机构
[1] Wayne State Univ, Detroit, MI 48202 USA
关键词
Big Data Workflows; NoSQL; Clouds;
D O I
10.1109/BigDataCongress.2016.15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. In the meanwhile, although NoSQL has emerged as a new category of data models, they are optimized for storing and querying of large datasets, not for ad-hoc data analysis where data placement and data movement are necessary for optimized workflow execution. In this paper, we propose a NoSQL data model that: 1) supports high-performance MapReduce-style workflows that automate data partitioning and data-parallelism execution. In contrast to the traditional MapReduce framework, our MapReduce-style workflows are fully composable with other workflows enabling dataflow applications with a richer structure; 2) automates virtual machine provisioning and deprovisioning on demand according to the sizes of input datasets; 3) enables a flexible framework for workflow executors that take advantage of the proposed NoSQL data model to improve the performance of workflow execution. Our case studies and experiments show the competitive advantages of our proposed data model. The proposed NoSQL data model is implemented in a new release of DATAVIEW, one of the most usable big data workflow systems in the community.
引用
收藏
页码:52 / 59
页数:8
相关论文
共 50 条
  • [1] WorkflowDSL: Scalable Workflow Execution with Provenance for Data Analysis Applications
    Fernando, Tharidu
    Gureev, Nikita
    Matskin, Mihhail
    Zwick, Michael
    Natschlaeger, Thomas
    [J]. 2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2018, : 774 - 779
  • [2] NoSQL Database: A Scalable, Availability, High Performance Storage for Big Data
    Huang, Yu
    Luo, Tiejian
    [J]. PERVASIVE COMPUTING AND THE NETWORKED WORLD, 2014, 8351 : 172 - 183
  • [3] A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning
    Wang, Jianwu
    Tang, Yan
    Nguyen, Mai
    Altintas, Ilkay
    [J]. 2014 IEEE/ACM INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2014, : 16 - 25
  • [4] Scalable SQL and NoSQL Data Stores
    Cattell, Rick
    [J]. SIGMOD RECORD, 2010, 39 (04) : 12 - 27
  • [5] A Workflow Model for Adaptive Analytics on Big Data
    Kantere, Verena
    Filatov, Maxim
    [J]. 2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 673 - 676
  • [6] Data Models in NoSQL Databases for Big Data Contexts
    Santos, Maribel Yasmina
    Costa, Carlos
    [J]. DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 475 - 485
  • [7] NoSQL Databases for Big Data Management
    Gaspar, Drazena
    Mabic, Mirela
    [J]. CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS (CECIIS 2016), 2016, : 3 - 10
  • [8] Big Data: The NoSQL and RDBMS review
    Zafar, Rashid
    Yafi, Eiad
    Zuhairi, Megat F.
    Dao, Hassan
    [J]. 2016 PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICICTM), 2016, : 120 - 126
  • [9] Handling Big Data using NoSQL
    Bhogal, Jagdev
    Choksi, Imran
    [J]. 2015 IEEE 29TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS WAINA 2015, 2015, : 393 - 398
  • [10] Survey on NoSQL for management of big data
    Shen, De-Rong
    Yu, Ge
    Wang, Xi-Te
    Nie, Tie-Zheng
    Kou, Yue
    [J]. Ruan Jian Xue Bao/Journal of Software, 2013, 24 (08): : 1786 - 1803