Accelerating Big Data Applications Using Lightweight Virtualization Framework on Enterprise Cloud

被引:0
|
作者
Bhimani, Janki [1 ]
Yang, Zhengyu [1 ]
Leeser, Miriam [1 ]
Mi, Ningfang [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, 360 Huntington Ave, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
Virtual Machine (VM); Container; Docker; Apache Spark; Big Data; Cloud Computing; Resource Management; Task Assignment; Workload Evaluation & Estimation; MAPREDUCE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hypervisor-based virtualization technology has been successfully used to deploy high-performance and scalable infrastructure for Hadoop, and now Spark applications. Container-based virtualization techniques are becoming an important option, which is increasingly used due to their lightweight operation and better scaling when compared to Virtual Machines (VM). With containerization techniques such as Docker becoming mature and promising better performance, we can use Docker to speed-up big data applications. However, as applications have different behaviors and resource requirements, before replacing traditional hypervisor-based virtual machines with Docker, it is important to analyze and compare performance of applications running in the cloud with VMs and Docker containers. VM provides distributed resource management for different virtual machines running with their own allocated resources, while Docker relies on shared pool of resources among all containers. Here, we investigate the performance of different Apache Spark applications using both Virtual Machines (VM) and Docker containers. While others have looked at Docker's performance, this is the first study that compares these different virtualization frameworks for a big data enterprise cloud environment using Apache Spark. In addition to makespan and execution time, we also analyze different resource utilization (CPU, disk, memory, etc.) by Spark applications. Our results show that Spark using Docker can obtain speed-up of over 10 times when compared to using VM. However, we observe that this may not apply to all applications due to different workload patterns and different resource management schemes performed by virtual machines and containers. Our work can guide application developers, system administrators and researchers to better design and deploy big data applications on their platforms to improve the overall performance.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Predicting the performance of big data applications on the cloud
    Ardagna, D.
    Barbierato, E.
    Gianniti, E.
    Gribaudo, M.
    Pinto, T. B. M.
    da Silva, A. P. C.
    Almeida, J. M.
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1321 - 1353
  • [22] Capacity Allocation for Big Data Applications in the Cloud
    Ciavotta, Michele
    Gianniti, Eugenio
    Ardagna, Danilo
    ICPE'17: COMPANION OF THE 2017 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, 2017, : 175 - 176
  • [23] A Cloud Reservation System for Big Data Applications
    Marinescu, Dan C.
    Paya, Ashkan
    Morrison, John P.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (03) : 606 - 618
  • [24] Big Data: Cloud Computing in Genomics Applications
    Yeo, Hangu
    Crawford, Catherine H.
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2904 - 2906
  • [25] Cloud computing and big data: Technologies and applications
    Zbakh, Mostapha
    Bakhouya, Mohamed
    Essaaidi, Mohamed
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (11):
  • [26] A Hybrid Cloud Infrastructure for Big Data Applications
    Loreti, Daniela
    Ciampolini, Anna
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 1713 - 1718
  • [27] Predicting the performance of big data applications on the cloud
    D. Ardagna
    E. Barbierato
    E. Gianniti
    M. Gribaudo
    T. B. M. Pinto
    A. P. C. da Silva
    J. M. Almeida
    The Journal of Supercomputing, 2021, 77 : 1321 - 1353
  • [28] Cloud computing and big data: Technologies and applications
    Zbakh, Mostapha
    Bakhouya, Mohamed
    Essaaidi, Mohamed
    Manneback, Pierre
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (12):
  • [29] Evaluating Serverless Architecture for Big Data Enterprise Applications
    Bhat, Aimer
    Park, Heeki
    Roy, Madhumonti
    8TH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, BDCAT 2021, 2021, : 1 - 8
  • [30] A framework for monitoring microservice-oriented cloud applications in heterogeneous virtualization environments
    Noor, Ayman
    Jha, Devki Nandan
    Mitra, Karan
    Jayaraman, Prem Prakash
    Souza, Arthur
    Ranjan, Rajiv
    Dustdar, Schahram
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 156 - 163