A NoSQL Data Model For Scalable Big Data Workflow Execution

被引:9
|
作者
Mohan, Aravind [1 ]
Ebrahimi, Mahdi [1 ]
Lu, Shiyong [1 ]
Kotov, Alexander [1 ]
机构
[1] Wayne State Univ, Detroit, MI 48202 USA
关键词
Big Data Workflows; NoSQL; Clouds;
D O I
10.1109/BigDataCongress.2016.15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. In the meanwhile, although NoSQL has emerged as a new category of data models, they are optimized for storing and querying of large datasets, not for ad-hoc data analysis where data placement and data movement are necessary for optimized workflow execution. In this paper, we propose a NoSQL data model that: 1) supports high-performance MapReduce-style workflows that automate data partitioning and data-parallelism execution. In contrast to the traditional MapReduce framework, our MapReduce-style workflows are fully composable with other workflows enabling dataflow applications with a richer structure; 2) automates virtual machine provisioning and deprovisioning on demand according to the sizes of input datasets; 3) enables a flexible framework for workflow executors that take advantage of the proposed NoSQL data model to improve the performance of workflow execution. Our case studies and experiments show the competitive advantages of our proposed data model. The proposed NoSQL data model is implemented in a new release of DATAVIEW, one of the most usable big data workflow systems in the community.
引用
收藏
页码:52 / 59
页数:8
相关论文
共 50 条
  • [41] Scalable Transformation of Big Geospatial Data into Linked Data
    Mandilaras, George
    Koubarakis, Manolis
    [J]. SEMANTIC WEB - ISWC 2021, 2021, 12922 : 480 - 495
  • [42] Formal Specification of the NoSQL Data Model
    Dmytro, Bui
    Sergey, Polyakov
    Hryshko, Iuliia
    [J]. INFORMATICS 2013: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON INFORMATICS, 2013, : 284 - 288
  • [43] Building Data Warehouses in the Era of Big Data An Approach for Scalable and Flexible Big Data Warehouses
    Costa, Carlos
    Santos, Maribel Yasmina
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2019), 2019, 11483 : 693 - 695
  • [44] Scalable Euclidean Embedding for Big Data
    Alavi, Zohreh
    Sharma, Sagar
    Zhou, Lu
    Chen, Keke
    [J]. 2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 773 - 780
  • [45] A Scalable Big Data Test Framework
    Li, Nan
    Escalona, Anthony
    Guo, Yun
    Offutt, Jeff
    [J]. 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST), 2015,
  • [46] Clouds for scalable Big Data processing
    Trunfio, Paolo
    Vlassov, Vladimir
    [J]. INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (06) : 629 - 631
  • [47] Clustered Workflow Execution of Retargeted Data Analysis Scripts
    Wang, Daniel L.
    Zender, Charles S.
    Jenks, Stephen F.
    [J]. CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS, 2008, : 449 - 458
  • [48] Clouds for Scalable Big Data Analytics
    Talia, Domenico
    [J]. COMPUTER, 2013, 46 (05) : 98 - 101
  • [49] Raw data queries during data-intensive parallel workflow execution
    Silva, Vitor
    Leite, Jose
    Camata, Jose J.
    de Oliveira, Daniel
    Coutinho, Alvaro L. G. A.
    Valduriez, Patrick
    Mattoso, Marta
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 75 : 402 - 422
  • [50] Analysis of Big Data Sized NoSQL Database with Secondary Index
    Chang, Bao Rong
    Tsai, Hsiu-Fen
    Chen, Chia-Yen
    Hsu, Hung-Ta
    [J]. INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014), 2015, 274 : 553 - 558