Albatross: An Efficient Cloud-Enabled Task Scheduling and Execution Framework Using Distributed Message Queues

被引:0
|
作者
Sadooghi, Iman [1 ]
Kumar, Geet [1 ]
Wang, Ke [1 ]
Zhao, Dongfang [1 ]
Li, Tonglin [1 ]
Raicu, Ioan [1 ]
机构
[1] IIT, Dept Comp Sci, Chicago, IL 60616 USA
基金
美国国家科学基金会;
关键词
Data Analytics; Task Scheduling; Distributed Systems; Spark; Hadoop; Distributed Task Execution; Distributed Message Queue;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data Analytics has become very popular on large datasets in different organizations. It is inevitable to use distributed resources such as Clouds for Data Analytics and other types of data processing at larger scales. To effectively utilize all system resources, an efficient scheduler is needed, but the traditional resource managers and job schedulers are centralized and designed for larger batch jobs which are fewer in number. Frameworks such as Hadoop and Spark, which are mainly designed for Big Data analytics, have been able to allow for more diversity in job types to some extent. However, even these systems have centralized architectures and will not be able to perform well on large scales and under heavy task loads. Modern applications generate tasks at very high rates that can cause significant slowdowns on these frameworks. Additionally, over-decomposition has shown to be very useful in increasing the system utilization. In order to achieve high efficiency, scalability, and better system utilization, it is critical for a modern scheduler to be able to handle over-decomposition and run highly granular tasks. Further, to achieve high performance, Albatross is written in C/C++, which imposes a minimal overhead to the workload process as compared to languages like Java or Python. We propose Albatross, a task level scheduling and execution framework that uses a Distributed Message Queue (DMQ) for task distribution among its workers. Unlike most scheduling systems, Albatross uses a pulling approach as opposed to the common push approach. The former would let Albatross achieve a good load balancing and scalability. Furthermore, the framework has built in support for task execution dependency on workflows. Therefore, Albatross is able to run various types of workloads, including Data Analytics and HPC applications. Finally, Albatross provides data locality support. This allows the framework to achieve higher performance through minimizing the amount of unnecessary data movement on the network. Our evaluations show that Albatross outperforms Spark and Hadoop at larger scales and in the case of running higher granularity workloads.
引用
收藏
页码:11 / 20
页数:10
相关论文
共 50 条
  • [1] Achieving Efficient Distributed Scheduling with Message Queues in the Cloud for Many-Task Computing and High-Performance Computing
    Sadooghi, Iman
    Palur, Sandeep
    Anthony, Ajay
    Kapur, Isha
    Belagodu, Karthik
    Purandare, Pankaj
    Ramamurty, Kiran
    Wang, Ke
    Raicu, Ioan
    [J]. 2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 404 - 413
  • [2] An Enhanced Data Security and Task Flow Scheduling in Cloud-enabled Wireless Body Area Network
    G. Shanmugavadivel
    B. Gomathy
    S. M. Ramesh
    [J]. Wireless Personal Communications, 2021, 120 : 849 - 867
  • [3] An Enhanced Data Security and Task Flow Scheduling in Cloud-enabled Wireless Body Area Network
    Shanmugavadivel, G.
    Gomathy, B.
    Ramesh, S. M.
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2021, 120 (01) : 849 - 867
  • [4] A Framework for Speculative Scheduling and Device Selection for Task Execution on a Mobile Cloud
    Banerjee, Ansuman
    Paul, Himadri Sekhar
    Mukherjee, Arijit
    Dey, Swarnava
    Datta, Pubali
    [J]. ADAPTIVE RESOURCE MANAGEMENT AND SCHEDULING FOR CLOUD COMPUTING (ARMS-CC 2014), 2014, 8907 : 36 - 51
  • [5] Efficient Workflow Scheduling in Edge Cloud-Enabled Space-Air-Ground- Integrated Information Systems
    Jiang, Yunke
    Sun, Xiaojuan
    [J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2024, 20 (01)
  • [6] An Efficient Task Scheduling Algorithm using Total Resource Execution Time Aware Algorithm in Cloud Computing
    Bandaranayake, K. M. S. U.
    Jayasena, K. P. N.
    Kumara, B. T. G. S.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 29 - 34
  • [7] An Improved and Efficient Distributed Computing Framework with Intelligent Task Scheduling
    Venkatesh, Pruthvi Raj
    Krishna, P. Radha
    [J]. DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2024, 2024, 14501 : 18 - 33
  • [8] Cloud-WBAN: An experimental framework for Cloud-enabled Wireless Body Area Network with efficient virtual resource utilization
    Bhardwaj, Tushar
    Sharma, Subhash Chander
    [J]. SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2018, 20 : 14 - 33
  • [9] Optimization of Maritime Communication Workflow Execution with a Task-Oriented Scheduling Framework in Cloud Computing
    Ahmad, Zulfiqar
    Acarer, Tayfun
    Kim, Wooseong
    [J]. JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (11)
  • [10] An Energy-Efficient Task Scheduling Algorithm in DVFS-enabled Cloud Environment
    Tang, Zhuo
    Qi, Ling
    Cheng, Zhenzhen
    Li, Kenli
    Khan, Samee U.
    Li, Keqin
    [J]. JOURNAL OF GRID COMPUTING, 2016, 14 (01) : 55 - 74