Albatross: An Efficient Cloud-Enabled Task Scheduling and Execution Framework Using Distributed Message Queues

被引:0
|
作者
Sadooghi, Iman [1 ]
Kumar, Geet [1 ]
Wang, Ke [1 ]
Zhao, Dongfang [1 ]
Li, Tonglin [1 ]
Raicu, Ioan [1 ]
机构
[1] IIT, Dept Comp Sci, Chicago, IL 60616 USA
基金
美国国家科学基金会;
关键词
Data Analytics; Task Scheduling; Distributed Systems; Spark; Hadoop; Distributed Task Execution; Distributed Message Queue;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data Analytics has become very popular on large datasets in different organizations. It is inevitable to use distributed resources such as Clouds for Data Analytics and other types of data processing at larger scales. To effectively utilize all system resources, an efficient scheduler is needed, but the traditional resource managers and job schedulers are centralized and designed for larger batch jobs which are fewer in number. Frameworks such as Hadoop and Spark, which are mainly designed for Big Data analytics, have been able to allow for more diversity in job types to some extent. However, even these systems have centralized architectures and will not be able to perform well on large scales and under heavy task loads. Modern applications generate tasks at very high rates that can cause significant slowdowns on these frameworks. Additionally, over-decomposition has shown to be very useful in increasing the system utilization. In order to achieve high efficiency, scalability, and better system utilization, it is critical for a modern scheduler to be able to handle over-decomposition and run highly granular tasks. Further, to achieve high performance, Albatross is written in C/C++, which imposes a minimal overhead to the workload process as compared to languages like Java or Python. We propose Albatross, a task level scheduling and execution framework that uses a Distributed Message Queue (DMQ) for task distribution among its workers. Unlike most scheduling systems, Albatross uses a pulling approach as opposed to the common push approach. The former would let Albatross achieve a good load balancing and scalability. Furthermore, the framework has built in support for task execution dependency on workflows. Therefore, Albatross is able to run various types of workloads, including Data Analytics and HPC applications. Finally, Albatross provides data locality support. This allows the framework to achieve higher performance through minimizing the amount of unnecessary data movement on the network. Our evaluations show that Albatross outperforms Spark and Hadoop at larger scales and in the case of running higher granularity workloads.
引用
收藏
页码:11 / 20
页数:10
相关论文
共 50 条
  • [31] An improved genetic algorithm for task scheduling in the cloud environments using the priority queues: Formal verification, simulation, and statistical testing
    Keshanchi, Bahman
    Souri, Alireza
    Navimipour, Nima Jafari
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 124 : 1 - 21
  • [32] Soft error-aware energy-efficient task scheduling for workflow applications in DVFS-enabled cloud
    Wu, Tingming
    Gu, Haifeng
    Zhou, Junlong
    Wei, Tongquan
    Liu, Xiao
    Chen, Mingsong
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2018, 84 : 12 - 27
  • [33] A hybrid genetic-based task scheduling algorithm for cost-efficient workflow execution in heterogeneous cloud computing environment
    Dehnavi, Mohsen Khademi
    Broumandnia, Ali
    Shirvani, Mirsaeid Hosseini
    Ahanian, Iman
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (08): : 10833 - 10858
  • [34] Microservice Based Computational Offloading Framework and Cost Efficient Task Scheduling Algorithm in Heterogeneous Fog Cloud Network
    Zhao, Xuehua
    Huang, Changcheng
    [J]. IEEE ACCESS, 2020, 8 : 56680 - 56694
  • [35] An Efficient and Secure Model Using Adaptive Optimal Deep Learning for Task Scheduling in Cloud Computing
    Badri, Sahar
    Alghazzawi, Daniyal M. M.
    Hasan, Syed Humaid
    Alfayez, Fayez
    Hasan, Syed Hamid
    Rahman, Monawar
    Bhatia, Surbhi
    [J]. ELECTRONICS, 2023, 12 (06)
  • [36] An Efficient Trust-Aware Task Scheduling Algorithm in Cloud Computing Using Firefly Optimization
    Mangalampalli, Sudheer
    Karri, Ganesh Reddy
    Elngar, Ahmed A. A.
    [J]. SENSORS, 2023, 23 (03)
  • [37] Prioritized Energy Efficient Task Scheduling Algorithm in Cloud Computing Using Whale Optimization Algorithm
    Mangalampalli, Sudheer
    Swain, Sangram Keshari
    Mangalampalli, Vamsi Krishna
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2022, 126 (03) : 2231 - 2247
  • [38] Development of a Hybrid Algorithm for efficient Task Scheduling in Cloud Computing environment using Artificial Intelligence
    Uddin, Mohammed Yousuf
    Abdeljaber, H. Awad
    Ahanger, Tariq Ahamed
    [J]. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2021, 16 (05) : 1 - 12
  • [39] Prioritized Energy Efficient Task Scheduling Algorithm in Cloud Computing Using Whale Optimization Algorithm
    Sudheer Mangalampalli
    Sangram Keshari Swain
    Vamsi Krishna Mangalampalli
    [J]. Wireless Personal Communications, 2022, 126 : 2231 - 2247
  • [40] Energy-efficient task scheduling and resource management in a cloud environment using optimized hybrid technology
    Arasan, K. Kalai
    Anandhakumar, P.
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2023, 53 (07): : 1572 - 1593