An Approach for Modeling and Ranking Node-level Stragglers in Cloud Datacenters

被引:7
|
作者
Ouyang, Xue [1 ,2 ]
Garraghan, Peter [1 ]
Wang, Changjian [2 ]
Townend, Paul [1 ]
Xu, Jie [1 ]
机构
[1] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England
[2] Natl Univ Def Technol, Parallel & Distributed Lab, Changsha, Hunan, Peoples R China
关键词
Stragglers; Node Performance; Clusters; Tracelog Data Analysis; Modeling; Ranking; MAPREDUCE;
D O I
10.1109/SCC.2016.93
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The ability of servers to effectively execute tasks within Cloud datacenters varies due to heterogeneous CPU and memory capacities, resource contention situations, network configurations and operational age. Unexpectedly slow server nodes (node-level stragglers) result in assigned tasks becoming task-level stragglers, which dramatically impede parallel job execution. However, it is currently unknown how slow nodes directly correlate to task straggler manifestation. To address this knowledge gap, we propose a method for node performance modeling and ranking in Cloud datacenters based on analyzing parallel job execution tracelog data. By using a production Cloud system as a case study, we demonstrate how node execution performance is driven by temporal changes in node operation as opposed to node hardware capacity. Different sample sets have been filtered in order to evaluate the generality of our framework, and the analytic results demonstrate that node abilities of executing parallel tasks tend to follow a 3-parameter-loglogistic distribution. Further statistical attribute values such as confidence interval, quantile value, extreme case possibility, etc. can also be used for ranking and identifying potential straggler nodes within the cluster. We exploit a graph-based algorithm for partitioning server nodes into five levels, with 0.83% of node-level stragglers identified. Our work lays the foundation towards enhancing scheduling algorithms by avoiding slow nodes, reducing task straggler occurrence, and improving parallel job performance.
引用
收藏
页码:673 / 680
页数:8
相关论文
共 50 条
  • [21] Node-Level Trust Evaluation in Wireless Sensor Networks
    Desai, S. Sundeep
    Nene, Manisha J.
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2019, 14 (08) : 2139 - 2152
  • [22] Preserving Node-level Privacy in Graph Neural Networks
    Xiang, Zihang
    Wang, Tianhao
    Wang, Di
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 4714 - 4732
  • [23] Graph augmentation for node-level few-shot learning
    Wu, Zongqian
    Zhou, Peng
    Ma, Junbo
    Zhang, Jilian
    Yuan, Guoqin
    Zhu, Xiaofeng
    KNOWLEDGE-BASED SYSTEMS, 2024, 297
  • [24] On Throughput Region for Primary and Secondary Networks With Node-Level Cooperation
    Yuan, Xu
    Tian, Feng
    Hou, Y. Thomas
    Lou, Wenjing
    Sherali, Hanif D.
    Kompella, Sastry
    Reed, Jeffrey H.
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2016, 34 (10) : 2763 - 2775
  • [25] A Novel Node-Level Rumor Propagation Model with Recommendation Mechanism
    Peng, Hong
    Yang, Xiaofan
    PROCEEDINGS OF 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (IEEE-ASID'2019), 2019, : 61 - 64
  • [26] Node-level architecture design and simulation of the MAGOG grid middleware
    Internet Centre, Department of Computing, Imperial College London, 180 Queen's Gate, London SW7 2AZ, United Kingdom
    不详
    Conf. Res. Pract. Inf. Technol. Ser., 2009, (57-67): : 57 - 67
  • [27] Node-Level Adaptive Graph Convolutional Neural Network for Node Classification Tasks
    Wang X.
    Hu R.
    Guo Y.
    Du H.
    Zhang B.
    Wang W.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2024, 37 (04): : 287 - 298
  • [28] Knowledge Graph Alignment Network with Node-Level Strong Fusion
    Liu, Shuang
    Xu, Man
    Qin, Yufeng
    Lukac, Niko
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [29] Scalable Node-Level Computation Kernels for Parallel Exact Inference
    Xia, Yinglong
    Prasanna, Viktor K.
    IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (01) : 103 - 115
  • [30] A Probabilistic Framework to Node-level Anomaly Detection in Communication Networks
    Le Bars, Batiste
    Kalogeratos, Argyris
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 2188 - 2196