Research on the Health Diagnosis Module of Large-scale Clusters

被引:0
|
作者
Yang, Cong [1 ]
Du, Wen-long [1 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Cloud Comp Res Ctr, Shenzhen, Peoples R China
关键词
cloud computing; decision tree; health diagnosis module; MONITORING-SYSTEM;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
a large number of low-level performance metrics include process, virtual and physical machine metrics that can be measured to identify a node or even a cluster health status. Traditionally, nodes in the cluster are monitored and managers need to analyze each metrics and alarming messages from monitoring tools to identify the health status of clusters. However, this process would cost too much time on some insignificant metrics and with less efficient because most clusters have more than hundreds nodes and it's impossible for one manager to check too much metrics in each nodes. In this work, we demonstrate that more time can be saved by simplify metrics set, scoring each nodes and diagnosis nodes health status by decision tree. Specially, this work first experimentally verifies and sorts the degree of relation between node health and different metrics. After that, we collect and score the training set by load increase testing. Thirdly, we construct a decision tree by training set. Finally, a health diagnosis module is composed by previous process, algorithm and decision tree. We evaluate the Health Diagnosis Module (HDM) on the Normal PC cluster. Experiments show that HDM can precise diagnose nodes and clusters' health status with more than 89% accuracy rate.
引用
收藏
页码:589 / 593
页数:5
相关论文
共 50 条