An Approach for Modeling and Ranking Node-level Stragglers in Cloud Datacenters

被引:7
|
作者
Ouyang, Xue [1 ,2 ]
Garraghan, Peter [1 ]
Wang, Changjian [2 ]
Townend, Paul [1 ]
Xu, Jie [1 ]
机构
[1] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England
[2] Natl Univ Def Technol, Parallel & Distributed Lab, Changsha, Hunan, Peoples R China
关键词
Stragglers; Node Performance; Clusters; Tracelog Data Analysis; Modeling; Ranking; MAPREDUCE;
D O I
10.1109/SCC.2016.93
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The ability of servers to effectively execute tasks within Cloud datacenters varies due to heterogeneous CPU and memory capacities, resource contention situations, network configurations and operational age. Unexpectedly slow server nodes (node-level stragglers) result in assigned tasks becoming task-level stragglers, which dramatically impede parallel job execution. However, it is currently unknown how slow nodes directly correlate to task straggler manifestation. To address this knowledge gap, we propose a method for node performance modeling and ranking in Cloud datacenters based on analyzing parallel job execution tracelog data. By using a production Cloud system as a case study, we demonstrate how node execution performance is driven by temporal changes in node operation as opposed to node hardware capacity. Different sample sets have been filtered in order to evaluate the generality of our framework, and the analytic results demonstrate that node abilities of executing parallel tasks tend to follow a 3-parameter-loglogistic distribution. Further statistical attribute values such as confidence interval, quantile value, extreme case possibility, etc. can also be used for ranking and identifying potential straggler nodes within the cluster. We exploit a graph-based algorithm for partitioning server nodes into five levels, with 0.83% of node-level stragglers identified. Our work lays the foundation towards enhancing scheduling algorithms by avoiding slow nodes, reducing task straggler occurrence, and improving parallel job performance.
引用
收藏
页码:673 / 680
页数:8
相关论文
共 50 条
  • [31] Node-level indicators of soft faults in wireless sensor networks
    Widhalm, Dominik
    Goeschka, Karl M.
    Kastner, Wolfgang
    2021 40TH INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2021), 2021, : 13 - 22
  • [32] Node-Level Graph Regression with Deep Gaussian Process Models
    Li N.
    Li W.
    Gao Y.
    Li Y.
    Bao J.
    Kuruoglu E.E.
    Jiang Y.
    Xia S.-T.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 3257 - 3269
  • [33] BrainFrame: a node-level heterogeneous accelerator platform for neuron simulations
    Smaragdos, Georgios
    Chatzikonstantis, Georgios
    Kukreja, Rahul
    Sidiropoulos, Harry
    Rodopoulos, Dimitrios
    Sourdis, Ioannis
    Al-Ars, Zaid
    Kachris, Christoforos
    Soudris, Dimitrios
    De Zeeuw, Chris, I
    Strydis, Christos
    JOURNAL OF NEURAL ENGINEERING, 2017, 14 (06)
  • [34] Impact of Protocols and network configuration on node-level availability in sensor networks
    Tsucihya, T
    Saito, H
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2005, E88B (03) : 981 - 990
  • [35] Sensor node fault detection in wireless sensor networks utilizing node-level diagnostics
    Widhalm, Dominik
    Goeschka, Karl M.
    Kastner, Wolfgang
    2023 42ND INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, SRDS 2023, 2023, : 118 - 128
  • [36] Node-Level Energy Management for Sensor Networks in the Presence of Multiple Applications
    Athanassios Boulis
    Mani Srivastava
    Wireless Networks, 2004, 10 : 737 - 746
  • [37] A Packet Classification and Node-Level Certification Mechanism for Intrusion Detection in MANET
    Manikandan, S. P.
    Manimegalai, R.
    Rakesh, V.
    Vaishnavi, V.
    GLOBAL TRENDS IN COMPUTING AND COMMUNICATION SYSTEMS, PT 1, 2012, 269 : 647 - +
  • [38] Distance disintegration characterizes node-level topological dysfunctions in cocaine addiction
    Costumero, Victor
    Rosell Negre, Patricia
    Carlos Bustamante, Juan
    Fuentes-Claramonte, Paola
    Adrian-Ventura, Jesus
    Palomar-Garcia, Maria-Angeles
    Miro-Padilla, Anna
    Jose Llopis, Juan
    Sepulcre, Jorge
    Barros-Loscertales, Alfonso
    ADDICTION BIOLOGY, 2021, 26 (06)
  • [39] Node-level energy management for sensor networks in the presence of multiple applications
    Boulis, A
    Srivastava, MB
    PROCEEDINGS OF THE FIRST IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM 2003), 2003, : 41 - 49
  • [40] ESTIMATING THE NODE-LEVEL BEHAVIORS IN COMPLEX NETWORKS FROM STRUCTURAL DATASETS
    Sha, Zhenghui
    Panchal, Jitesh H.
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2013, VOL 2B, 2014,