Accelerating Big Data Applications Using Lightweight Virtualization Framework on Enterprise Cloud

被引:0
|
作者
Bhimani, Janki [1 ]
Yang, Zhengyu [1 ]
Leeser, Miriam [1 ]
Mi, Ningfang [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, 360 Huntington Ave, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
Virtual Machine (VM); Container; Docker; Apache Spark; Big Data; Cloud Computing; Resource Management; Task Assignment; Workload Evaluation & Estimation; MAPREDUCE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hypervisor-based virtualization technology has been successfully used to deploy high-performance and scalable infrastructure for Hadoop, and now Spark applications. Container-based virtualization techniques are becoming an important option, which is increasingly used due to their lightweight operation and better scaling when compared to Virtual Machines (VM). With containerization techniques such as Docker becoming mature and promising better performance, we can use Docker to speed-up big data applications. However, as applications have different behaviors and resource requirements, before replacing traditional hypervisor-based virtual machines with Docker, it is important to analyze and compare performance of applications running in the cloud with VMs and Docker containers. VM provides distributed resource management for different virtual machines running with their own allocated resources, while Docker relies on shared pool of resources among all containers. Here, we investigate the performance of different Apache Spark applications using both Virtual Machines (VM) and Docker containers. While others have looked at Docker's performance, this is the first study that compares these different virtualization frameworks for a big data enterprise cloud environment using Apache Spark. In addition to makespan and execution time, we also analyze different resource utilization (CPU, disk, memory, etc.) by Spark applications. Our results show that Spark using Docker can obtain speed-up of over 10 times when compared to using VM. However, we observe that this may not apply to all applications due to different workload patterns and different resource management schemes performed by virtual machines and containers. Our work can guide application developers, system administrators and researchers to better design and deploy big data applications on their platforms to improve the overall performance.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] A Cloud Storage Framework for Massive Meteorological and Oceanographic Data and the Application of Virtualization Technology
    Wu, Song
    Wang, Xiang
    Tang, Bo
    Li, Xiaoyong
    Zhu, Junxing
    Deng, Kefeng
    2020 INTERNATIONAL CONFERENCE ON SPACE-AIR-GROUND COMPUTING (SAGC 2020), 2020, : 25 - 32
  • [42] Dache: A Data Aware Caching for Big-Data Applications Using The MapReduce Framework
    Zhao, Yaxiong
    Wu, Jie
    2013 PROCEEDINGS IEEE INFOCOM, 2013, : 35 - 39
  • [43] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
    Zhao, Yaxiong
    Wu, Jie
    Liu, Cong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2014, 19 (01) : 39 - 50
  • [44] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
    Yaxiong Zhao
    Jie Wu
    Cong Liu
    TsinghuaScienceandTechnology, 2014, 19 (01) : 39 - 50
  • [45] Dache: A data aware caching for big-data applications using the MapReduce framework
    Zhao, Y. (yaxiongzhao@google.com), 1600, Tsinghua University (19):
  • [46] Sentiment Analysis of Big Data Applications using Twitter Data with the Help of HADOOP Framework
    Sehgal, Divya
    Agarwal, Ambuj Kumar
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2016), 2016, : 251 - 255
  • [47] Assessing Cloud Computing SaaS adoption for Enterprise Applications using a Petri net MCDM framework
    Ribas, Maristella
    Lima, Alberto S.
    Souza, Neuman
    Moura, Antao
    Sousa, Flavio R. C.
    Fenner, Germano
    2014 IEEE NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (NOMS), 2014,
  • [48] RETRACTED: An enhance the data security performance using an optimal cloud network security for big data cloud framework (Retracted Article)
    Venkatesan, B.
    Chitra, S.
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2022, 35 (16)
  • [49] An Overview of Monitoring Tools for Big Data and Cloud Applications
    Iuhasz, Gabriel
    Dragan, Ioan
    2015 17TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC), 2016, : 363 - 366
  • [50] Cloud computing,IoT, and big data: Technologies and applications
    Bakhouya, Mohamed
    Zbakh, Mostapha
    Essaaidi, Mohamed
    Manneback, Pierre
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (17):