Fault tolerance using "Parallel shadow image servers (PSIS)" in grid based computing environment

被引：3

作者：

Hussain, Naveed ^{[1
]}

Ansari, M. A.

Yasin, M. M.

Rauf, Abdul

Haider, Sajjad

机构：

[1] Natl Univ Modern Languages, Dept Informat Technol, Islamabad, Pakistan

[2] Fed Urdu Univ Arts Sci & Technol, Dept Comp Sci, Islamabad, Pakistan

[3] COMSATS Inst Informat Technol, Dept Comp Sci, Islamabad, Pakistan

来源：

SECOND INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES 2006, PROCEEDINGS | 2006年

关键词：

grid computing; fault tolerance; PSIS; condor; cactus; job scheduling;

D O I：

10.1109/ICET.2006.335982

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper will present a critical review, of the existing fault tolerance mechanism in grid computing and the overhead involved in terms of reprocessing or rescheduling of jobs, if in case a fault arisen For this purpose we suggested the Parallel Shadow Image Server (PSIS) copying techniques in parallel to the Resource Manager for having the check points for rescheduling of jobs from the nearest flag, if in case the fault is detected. The job process is to be scheduled from the resource manager node to the worker nodes and then its' submitted back by the worker nodes in serialized form to the Parallel Shadow Image Servers from the worker nodes after the pre-specified amount of time, which we call the recent spawn or the flag check point for rescheduling or reprocessing of job. If the fault is arisen then the rescheduling will be done from the recent check point and will be submitted to the worker rode from where the job was terminated. This will not only save time but will improve the performance up to major extent.

引用

页码：703 / 707

页数：5

共 50 条

[41] Application-Level Fault-Tolerance Solutions for Grid Computing
Diaz, Daniel
Pardo, Xoan C.
Martin, Maria J.
Gonzalez, Patricia
CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS, 2008, : 554 - 559
[42] Fault tolerance for a scientific workflow system in a Cloud computing environment
Khaldi M.
Rebbah M.
Meftah B.
Smail O.
International Journal of Computers and Applications, 2020, 42 (07) : 705 - 714
[43] An Integrated Virtualized Strategy for Fault Tolerance in Cloud Computing Environment
Mohammed, Bashir
Kiran, Mariam
Awan, Irfan-Ullah
Maiyama, Kabiru M.
2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 542 - 549
[44] RRBS: A fault tolerance model for cluster/grid parallel file system
Huo, YM
Ju, JB
Hu, L
PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, 2005, 3758 : 180 - 187
[45] New Fault Tolerant Scheduling Algorithm Implemented using Check Pointing in Grid Computing Environment
Jain, Sumant
Chaudhary, Jyoti
PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON RELIABILTY, OPTIMIZATION, & INFORMATION TECHNOLOGY (ICROIT 2014), 2014, : 393 - 396
[46] Evaluation of Grid Computing Environment Using TOPSIS
Mohammaddoust, Mahmoud
Harounabadi, Ali
Neizari, Mohammadali
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (03) : 323 - 331
[47] Self-adaptive fault-tolerance of HLA-based simulations in the grid environment
Huang, Jijie
Chai, Xudong
Zhang, Lin
Li, Bo Hu
ASIASIM 2007, 2007, 5 : 56 - +
[48] Fault tolerance aware scheduling technique for cloud computing environment using dynamic clustering algorithm
Abdulhamid, Shafi'i Muhammad
Abd Latiff, Muhammad Shafie
Madni, Syed Hamid Hussain
Abdullahi, Mohammed
NEURAL COMPUTING & APPLICATIONS, 2018, 29 (01): : 279 - 293
[49] Fault tolerance aware scheduling technique for cloud computing environment using dynamic clustering algorithm
Shafi’i Muhammad Abdulhamid
Muhammad Shafie Abd Latiff
Syed Hamid Hussain Madni
Mohammed Abdullahi
Neural Computing and Applications, 2018, 29 : 279 - 293
[50] A replication-based fault tolerance protocol using group communication for the Grid
Erciyes, Kayhan
Parallel and Distributed Processing and Applications, 2006, 4330 : 672 - 681

← 1 2 3 4 5 →