Optimal Recovery from Large-Scale Failures in IP Networks

被引：7

作者：

Zheng, Qiang ^{[1
]}

Cao, Guohong ^{[1
]}

La Porta, Tom ^{[1
]}

Swami, Ananthram ^{[2
]}

机构：

[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA

[2] US Army, Res Lab, Adelphi, MD USA

来源：

2012 IEEE 32ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS) | 2012年

关键词：

D O I：

10.1109/ICDCS.2012.47

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Quickly recovering IP networks from failures is critical to enhancing Internet robustness and availability. Due to their serious impact on network routing, large-scale failures have received increasing attention in recent years. We propose an approach called Reactive Two-phase Rerouting (RTR) for intra-domain routing to quickly recover from large-scale failures with the shortest recovery paths. To recover a failed routing path, RTR first forwards packets around the failure area to collect information on failures. Then, in the second phase, RTR calculates a new shortest path and forwards packets along it through source routing. RTR can deal with large-scale failures associated with areas of any shape and location, and is free of permanent loops. For any failure area, the recovery paths provided by RTR are guaranteed to be the shortest. Extensive simulations based on ISP topologies show that RTR can find the shortest recovery paths for more than 98.6% of failed routing paths with reachable destinations. Compared with prior works, RTR achieves better performance for recoverable failed routing paths and uses much less network resources for irrecoverable failed routing paths.

引用

页码：295 / 304

页数：10

共 50 条

[41] On scalable modeling of TCP congestion control mechanism for large-scale IP networks
Ohsaki, H
Ujiie, J
Imase, M
2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2005, : 361 - 368
[42] Group communication for large-scale distributed systems over IP multicast networks
Mathur, AG
INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 710 - 717
[43] Understanding the Context of Large-Scale IT Project Failures
Rich, Eliot
Nelson, Mark R.
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH, 2012, 5 (02) : 1 - 24
[44] A Large-Scale Study of Failures on Petascale Supercomputers
Rui-Tao Liu
Zuo-Ning Chen
Journal of Computer Science and Technology, 2018, 33 : 24 - 41
[45] Community Detection in large-scale IP networks by Observing Traffic at Network Boundary
Jakalan, Ahmad
Gong, Jian
Su, Qi
Hu, Xiaoyan
WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2015, VOL I, 2015, : 59 - 64
[46] Recovery from simultaneous failures in a large scale wireless sensor network
Chouikhi, Samira
El Korbi, Ines
Ghamri-Doudane, Yacine
Saidane, Leila Azouz
AD HOC NETWORKS, 2017, 67 : 68 - 76
[47] A Large-Scale Study of Failures on Petascale Supercomputers
Liu, Rui-Tao
Chen, Zuo-Ning
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2018, 33 (01) : 24 - 41
[48] FROM RECOVERY TO DEVELOPMENT THROUGH LARGE-SCALE CHANGES
GALBRAITH, JR
LARGE-SCALE ORGANIZATIONAL CHANGE, 1989, : 62 - 87
[49] Traffic-level Community Protection in Telecommunication Networks under Large-Scale Failures
Torres-Padrosa, Vctor
Manzano, Marc
Calle, Eusebi
Marzo, Josep L.
2012 INTERNATIONAL SYMPOSIUM ON PERFORMANCE EVALUATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (SPECTS), 2012,
[50] Self-Diagnosis for Detecting System Failures in Large-Scale Wireless Sensor Networks
Liu, Kebin
Ma, Qiang
Gong, Wei
Miao, Xin
Liu, Yunhao
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2014, 13 (10) : 5535 - 5545

← 1 2 3 4 5 →