Autonomic failure prediction based on manifold learning for large-scale distributed systems

被引：6

作者：

Lu X. ^{[1
]}

Wang H.-Q. ^{[1
]}

Zhou R.-J. ^{[1
]}

Ge B.-Y. ^{[1
]}

机构：

[1] College of Computer Science and Technology, Harbin Engineering University

来源：

Journal of China Universities of Posts and Telecommunications | 2010年 / 17卷 / 04期

基金：

中国国家自然科学基金;

关键词：

autonomic computing; failure prediction; locally linear embedding; manifold learning;

D O I：

10.1016/S1005-8885(09)60497-0

中图分类号：

学科分类号：

摘要：

This article investigates autonomic failure prediction in large-scale distributed systems with nonlinear dimensionality reduction to automatically extract failure features. Most existing methods for failure prediction focus on building prediction models or heuristic rules by discovering failure patterns, but the process of feature extraction before failure patterns recognition is rarely considered due to the increasing complexity of modern distributed systems. In this work, a novel performance-centric approach to automate failure prediction is proposed based on manifold learning (ML). In addition, the ML algorithm named supervised locally linear embedding (SLLE) is applied to achieve feature extraction. To generalize the dimensionality reduction mapping, the nonlinear mapping approximation and optimization solution is also proposed. In experimental work a file transfer test bed with fault injection is developed which can gather multilevel performance metrics transparently. Based on the runtime monitoring of these metrics, the SLLE method can automatically predict more than 50 of the central processing unit (CPU) and memory failures, and around 70 of the network failure. © 2010 The Journal of China Universities of Posts and Telecommunications.

引用

页码：116 / 124

页数：8

共 50 条

[31] Analysis of large-scale distributed information systems
Hellerstein, JL
Jayram, TS
Squillante, MS
8TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS, PROCEEDINGS, 2000, : 164 - 171
[32] Robustness of large-scale distributed computer systems
Khoroshevsky, VG
EUROSIM '96 - HPCN CHALLENGES IN TELECOMP AND TELECOM: PARALLEL SIMULATION OF COMPLEX SYSTEMS AND LARGE-SCALE APPLICATIONS, 1996, : 141 - 150
[33] Legal reliability in large-scale distributed systems
Sommer, P
SEVENTEENTH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 1998, : 416 - 421
[34] Risk modeling in distributed, large-scale systems
Grabowski, M
Merrick, JRW
Harrald, JR
Mazzuchi, TA
van Dorp, JR
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2000, 30 (06): : 651 - 660
[35] Designing a Testbed for Large-scale Distributed Systems
Leng, Christof
Lehn, Max
Rehner, Robert
Buchmann, Alejandro
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2011, 41 (04) : 400 - 401
[36] A Token-Based Scheme for Coordinating Decisions in Large-Scale Autonomic Systems
Melekhova, Olga
Malenfant, Jacques
2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 60 - 65
[37] Efficient Objective Functions for Coordinated Learning in Large-Scale Distributed OSA Systems
NoroozOliaee, MohammadJavad
Hamdaoui, Bechir
Tumer, Kagan
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2013, 12 (05) : 931 - 944
[38] Distributed Learning Algorithm for Distributed PV Large-Scale Access to Power Grid Based on Machine Learning
Lei, Zhen
Yang, Yong-biao
Xu, Xiao-hui
ADVANCED HYBRID INFORMATION PROCESSING, ADHIP 2019, PT I, 2019, 301 : 439 - 447
[39] Distributed LMMSE Estimation for Large-Scale Systems Based on Local Information
Wang, Yan
Xiong, Junlin
Ho, Daniel W. C.
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 8528 - 8536
[40] Cluster-based file replication in large-scale distributed systems
Sandhu, Harjinder
Zhou, Songnian
Performance Evaluation Review, 1992, 20 (01):

← 1 2 3 4 5 →