Optimal Recovery from Large-Scale Failures in IP Networks

被引:7
|
作者
Zheng, Qiang [1 ]
Cao, Guohong [1 ]
La Porta, Tom [1 ]
Swami, Ananthram [2 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
[2] US Army, Res Lab, Adelphi, MD USA
关键词
D O I
10.1109/ICDCS.2012.47
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Quickly recovering IP networks from failures is critical to enhancing Internet robustness and availability. Due to their serious impact on network routing, large-scale failures have received increasing attention in recent years. We propose an approach called Reactive Two-phase Rerouting (RTR) for intra-domain routing to quickly recover from large-scale failures with the shortest recovery paths. To recover a failed routing path, RTR first forwards packets around the failure area to collect information on failures. Then, in the second phase, RTR calculates a new shortest path and forwards packets along it through source routing. RTR can deal with large-scale failures associated with areas of any shape and location, and is free of permanent loops. For any failure area, the recovery paths provided by RTR are guaranteed to be the shortest. Extensive simulations based on ISP topologies show that RTR can find the shortest recovery paths for more than 98.6% of failed routing paths with reachable destinations. Compared with prior works, RTR achieves better performance for recoverable failed routing paths and uses much less network resources for irrecoverable failed routing paths.
引用
收藏
页码:295 / 304
页数:10
相关论文
共 50 条
  • [41] On scalable modeling of TCP congestion control mechanism for large-scale IP networks
    Ohsaki, H
    Ujiie, J
    Imase, M
    2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2005, : 361 - 368
  • [42] Group communication for large-scale distributed systems over IP multicast networks
    Mathur, AG
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 710 - 717
  • [43] Understanding the Context of Large-Scale IT Project Failures
    Rich, Eliot
    Nelson, Mark R.
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH, 2012, 5 (02) : 1 - 24
  • [44] A Large-Scale Study of Failures on Petascale Supercomputers
    Rui-Tao Liu
    Zuo-Ning Chen
    Journal of Computer Science and Technology, 2018, 33 : 24 - 41
  • [45] Community Detection in large-scale IP networks by Observing Traffic at Network Boundary
    Jakalan, Ahmad
    Gong, Jian
    Su, Qi
    Hu, Xiaoyan
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2015, VOL I, 2015, : 59 - 64
  • [46] Recovery from simultaneous failures in a large scale wireless sensor network
    Chouikhi, Samira
    El Korbi, Ines
    Ghamri-Doudane, Yacine
    Saidane, Leila Azouz
    AD HOC NETWORKS, 2017, 67 : 68 - 76
  • [47] A Large-Scale Study of Failures on Petascale Supercomputers
    Liu, Rui-Tao
    Chen, Zuo-Ning
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2018, 33 (01) : 24 - 41
  • [48] FROM RECOVERY TO DEVELOPMENT THROUGH LARGE-SCALE CHANGES
    GALBRAITH, JR
    LARGE-SCALE ORGANIZATIONAL CHANGE, 1989, : 62 - 87
  • [49] Traffic-level Community Protection in Telecommunication Networks under Large-Scale Failures
    Torres-Padrosa, Vctor
    Manzano, Marc
    Calle, Eusebi
    Marzo, Josep L.
    2012 INTERNATIONAL SYMPOSIUM ON PERFORMANCE EVALUATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (SPECTS), 2012,
  • [50] Self-Diagnosis for Detecting System Failures in Large-Scale Wireless Sensor Networks
    Liu, Kebin
    Ma, Qiang
    Gong, Wei
    Miao, Xin
    Liu, Yunhao
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2014, 13 (10) : 5535 - 5545