Fault tolerance using "Parallel shadow image servers (PSIS)" in grid based computing environment

被引:3
|
作者
Hussain, Naveed [1 ]
Ansari, M. A.
Yasin, M. M.
Rauf, Abdul
Haider, Sajjad
机构
[1] Natl Univ Modern Languages, Dept Informat Technol, Islamabad, Pakistan
[2] Fed Urdu Univ Arts Sci & Technol, Dept Comp Sci, Islamabad, Pakistan
[3] COMSATS Inst Informat Technol, Dept Comp Sci, Islamabad, Pakistan
关键词
grid computing; fault tolerance; PSIS; condor; cactus; job scheduling;
D O I
10.1109/ICET.2006.335982
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper will present a critical review, of the existing fault tolerance mechanism in grid computing and the overhead involved in terms of reprocessing or rescheduling of jobs, if in case a fault arisen For this purpose we suggested the Parallel Shadow Image Server (PSIS) copying techniques in parallel to the Resource Manager for having the check points for rescheduling of jobs from the nearest flag, if in case the fault is detected. The job process is to be scheduled from the resource manager node to the worker nodes and then its' submitted back by the worker nodes in serialized form to the Parallel Shadow Image Servers from the worker nodes after the pre-specified amount of time, which we call the recent spawn or the flag check point for rescheduling or reprocessing of job. If the fault is arisen then the rescheduling will be done from the recent check point and will be submitted to the worker rode from where the job was terminated. This will not only save time but will improve the performance up to major extent.
引用
收藏
页码:703 / 707
页数:5
相关论文
共 50 条
  • [41] Application-Level Fault-Tolerance Solutions for Grid Computing
    Diaz, Daniel
    Pardo, Xoan C.
    Martin, Maria J.
    Gonzalez, Patricia
    CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS, 2008, : 554 - 559
  • [42] Fault tolerance for a scientific workflow system in a Cloud computing environment
    Khaldi M.
    Rebbah M.
    Meftah B.
    Smail O.
    International Journal of Computers and Applications, 2020, 42 (07) : 705 - 714
  • [43] An Integrated Virtualized Strategy for Fault Tolerance in Cloud Computing Environment
    Mohammed, Bashir
    Kiran, Mariam
    Awan, Irfan-Ullah
    Maiyama, Kabiru M.
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 542 - 549
  • [44] RRBS: A fault tolerance model for cluster/grid parallel file system
    Huo, YM
    Ju, JB
    Hu, L
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, 2005, 3758 : 180 - 187
  • [45] New Fault Tolerant Scheduling Algorithm Implemented using Check Pointing in Grid Computing Environment
    Jain, Sumant
    Chaudhary, Jyoti
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON RELIABILTY, OPTIMIZATION, & INFORMATION TECHNOLOGY (ICROIT 2014), 2014, : 393 - 396
  • [46] Evaluation of Grid Computing Environment Using TOPSIS
    Mohammaddoust, Mahmoud
    Harounabadi, Ali
    Neizari, Mohammadali
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (03) : 323 - 331
  • [47] Self-adaptive fault-tolerance of HLA-based simulations in the grid environment
    Huang, Jijie
    Chai, Xudong
    Zhang, Lin
    Li, Bo Hu
    ASIASIM 2007, 2007, 5 : 56 - +
  • [48] Fault tolerance aware scheduling technique for cloud computing environment using dynamic clustering algorithm
    Abdulhamid, Shafi'i Muhammad
    Abd Latiff, Muhammad Shafie
    Madni, Syed Hamid Hussain
    Abdullahi, Mohammed
    NEURAL COMPUTING & APPLICATIONS, 2018, 29 (01): : 279 - 293
  • [49] Fault tolerance aware scheduling technique for cloud computing environment using dynamic clustering algorithm
    Shafi’i Muhammad Abdulhamid
    Muhammad Shafie Abd Latiff
    Syed Hamid Hussain Madni
    Mohammed Abdullahi
    Neural Computing and Applications, 2018, 29 : 279 - 293
  • [50] A replication-based fault tolerance protocol using group communication for the Grid
    Erciyes, Kayhan
    Parallel and Distributed Processing and Applications, 2006, 4330 : 672 - 681