Data Analytics in the Cloud with Flexible MapReduce Workflows

被引:0
|
作者
Goncalves, Carlos [1 ,2 ]
Assuncao, Luis [1 ,2 ]
Cunha, Jose C. [2 ]
机构
[1] Univ Nova Lisboa, Inst Super Engn Lisboa, P-1200 Lisbon, Portugal
[2] Univ Nova Lisboa, Fac Ciencias Tecnol, Dept Informat, CITI, P-1200 Lisbon, Portugal
关键词
MapReduce; Workflow; Text Mining; Cloud; MAP-REDUCE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud It would he desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition. of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as sub-workflows in the A WARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the A WARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
引用
下载
收藏
页数:8
相关论文
共 50 条
  • [41] Cloud Based K-Means Clustering Running as a MapReduce Job for Big Data Healthcare Analytics Using Apache Mahout
    Rallapalli, Sreekanth
    Gondkar, R. R.
    Rao, Golajapu Venu Madhava
    INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, INDIA 2016, 2016, 433 : 127 - 135
  • [42] A Model Driven Approach for Modelling and Running Flexible Cloud Service Workflows
    Ben Fradj, Imen
    Hlaoui, Yousra BenDaly
    BenAyed, Leila Jemni
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 606 - 613
  • [43] Flexible Container-Based Computing Platform on Cloud for Scientific Workflows
    Liu, Kai
    Aida, Kento
    Yokoyama, Shigetoshi
    Masatani, Yoshinobu
    2016 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING RESEARCH AND INNOVATION - ICCCRI 2016, 2016, : 56 - 63
  • [44] A System Architecture for Running Big Data Workflows in the Cloud
    Kashlev, Andrey
    Lu, Shiyong
    2014 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2014), 2014, : 51 - 58
  • [45] Simulation of Runtime Performance of Big Data Workflows on the Cloud
    Llwaah, Faris
    Cala, Jacek
    Thomas, Nigel
    COMPUTER PERFORMANCE ENGINEERING, 2016, 9951 : 141 - 155
  • [46] Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework
    Li, Zhenlong
    Yang, Chaowei
    Jin, Baoxuan
    Yu, Manzhu
    Liu, Kai
    Sun, Min
    Zhan, Matthew
    PLOS ONE, 2015, 10 (03):
  • [47] Authenticable Data Analytics Over Encrypted Data in the Cloud
    Chen, Lanxiang
    Mu, Yi
    Zeng, Lingfang
    Rezaeibagha, Fatemeh
    Deng, Robert H.
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1800 - 1813
  • [48] On using MapReduce to scale algorithms for Big Data analytics: a case study
    Kijsanayothin, Phongphun
    Chalumporn, Gantaphon
    Hewett, Rattikorn
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [49] A Hadoop/MapReduce based platform for supporting health big data analytics
    Kuo A.
    Chrimes D.
    Qin P.
    Zamani H.
    Studies in Health Technology and Informatics, 2019, 257 : 229 - 235
  • [50] A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud
    Zhao, Qing
    Xiong, Congcong
    Zhao, Xi
    Yu, Ce
    Xiao, Jian
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 928 - 934