Autonomic failure prediction based on manifold learning for large-scale distributed systems

被引:6
|
作者
Lu X. [1 ]
Wang H.-Q. [1 ]
Zhou R.-J. [1 ]
Ge B.-Y. [1 ]
机构
[1] College of Computer Science and Technology, Harbin Engineering University
基金
中国国家自然科学基金;
关键词
autonomic computing; failure prediction; locally linear embedding; manifold learning;
D O I
10.1016/S1005-8885(09)60497-0
中图分类号
学科分类号
摘要
This article investigates autonomic failure prediction in large-scale distributed systems with nonlinear dimensionality reduction to automatically extract failure features. Most existing methods for failure prediction focus on building prediction models or heuristic rules by discovering failure patterns, but the process of feature extraction before failure patterns recognition is rarely considered due to the increasing complexity of modern distributed systems. In this work, a novel performance-centric approach to automate failure prediction is proposed based on manifold learning (ML). In addition, the ML algorithm named supervised locally linear embedding (SLLE) is applied to achieve feature extraction. To generalize the dimensionality reduction mapping, the nonlinear mapping approximation and optimization solution is also proposed. In experimental work a file transfer test bed with fault injection is developed which can gather multilevel performance metrics transparently. Based on the runtime monitoring of these metrics, the SLLE method can automatically predict more than 50 of the central processing unit (CPU) and memory failures, and around 70 of the network failure. © 2010 The Journal of China Universities of Posts and Telecommunications.
引用
收藏
页码:116 / 124
页数:8
相关论文
共 50 条
  • [41] Distributed Control of Networked Large-Scale Systems Based on A Scheduling Middleware
    Lin, Yufeng
    Wang, Jia
    Han, Qing-Long
    Jarvis, Dennis
    IECON 2017 - 43RD ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2017, : 5523 - 5528
  • [42] Antenna selection based on large-scale fading for distributed MIMO systems
    施荣华
    Yuan Zexi
    Dong Jian
    Lei Wentai
    Peng Chunhua
    High Technology Letters, 2016, 22 (03) : 233 - 240
  • [43] Communication Pattern-based Distributed Snapshots in Large-Scale Systems
    Saker, Salem
    Agbaria, Adnan
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1062 - 1071
  • [44] An Adaptive Metadata Management Scheme Based on Deep Reinforcement Learning for Large-Scale Distributed File Systems
    Huang, Xiuqi
    Gao, Yuanning
    Zhou, Xinyi
    Gao, Xiaofeng
    Chen, Guihai
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (06) : 2840 - 2853
  • [45] Autonomic Service Hosting for Large-Scale Distributed MOVE-services
    Van Den Bossche, Bruno
    De Turck, Filip
    Dhoedt, Bart
    Demeester, Piet
    2009 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2009) VOLS 1 AND 2, 2009, : 81 - 88
  • [46] Ensemble Learning for Large-Scale Workload Prediction
    Singh, Nidhi
    Rao, Shrisha
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (02) : 149 - 165
  • [47] Distributed non-fragile stabilization of large-scale systems with random controller failure
    Chen, Juntong
    Ling, Rongyao
    Zhang, Dan
    NEUROCOMPUTING, 2016, 173 : 2033 - 2038
  • [48] Automatic Failure Diagnosis Support in Distributed Large-Scale Software Systems based on Timing Behavior Anomaly Correlation
    Marwede, Nina
    Rohr, Matthias
    van Hoorn, Andre
    Hasselbring, Wilhelm
    13TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING: CSMR 2009, PROCEEDINGS, 2009, : 47 - +
  • [49] Research of large scale manifold learning based on MapReduce
    Xue, Yong-Jian
    Ni, Zhi-Wei
    Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice, 2014, 34 : 151 - 157
  • [50] Electronic document management systems and distributed large-scale systems
    Orlov, V. L.
    Kurako, E. A.
    2017 TENTH INTERNATIONAL CONFERENCE MANAGEMENT OF LARGE-SCALE SYSTEM DEVELOPMENT (MLSD), 2017,