Fault tolerance using "Parallel shadow image servers (PSIS)" in grid based computing environment

被引：3

作者：

Hussain, Naveed ^{[1
]}

Ansari, M. A.

Yasin, M. M.

Rauf, Abdul

Haider, Sajjad

机构：

[1] Natl Univ Modern Languages, Dept Informat Technol, Islamabad, Pakistan

[2] Fed Urdu Univ Arts Sci & Technol, Dept Comp Sci, Islamabad, Pakistan

[3] COMSATS Inst Informat Technol, Dept Comp Sci, Islamabad, Pakistan

来源：

SECOND INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES 2006, PROCEEDINGS | 2006年

关键词：

grid computing; fault tolerance; PSIS; condor; cactus; job scheduling;

D O I：

10.1109/ICET.2006.335982

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper will present a critical review, of the existing fault tolerance mechanism in grid computing and the overhead involved in terms of reprocessing or rescheduling of jobs, if in case a fault arisen For this purpose we suggested the Parallel Shadow Image Server (PSIS) copying techniques in parallel to the Resource Manager for having the check points for rescheduling of jobs from the nearest flag, if in case the fault is detected. The job process is to be scheduled from the resource manager node to the worker nodes and then its' submitted back by the worker nodes in serialized form to the Parallel Shadow Image Servers from the worker nodes after the pre-specified amount of time, which we call the recent spawn or the flag check point for rescheduling or reprocessing of job. If the fault is arisen then the rescheduling will be done from the recent check point and will be submitted to the worker rode from where the job was terminated. This will not only save time but will improve the performance up to major extent.

引用

页码：703 / 707

页数：5

共 50 条

[1] Parallel computing in grid environment
Yilmaz, E
Ecer, A
Akay, HU
Payli, RU
Chien, S
Wang, Y
PARALLEL COMPUTATIONAL FLUID DYNAMICS: ADVANCED NUMERICAL METHODS SOFTWARE AND APPLICATIONS, 2004, : 293 - 300
[2] DDGrid: A Grid Computing Environment with Massive Concurrency and Fault-tolerance Support
Wang, Yongjian
Luan, Zhongzhi
Qian, Depei
Huang, Yuanqiang
Chen, Ting
Han, Biao
Ren, Yinan
Yu, Kunqian
Jiang, Hualiang
GCC 2008: SEVENTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING, PROCEEDINGS, 2008, : 5 - +
[3] Fault tolerance in autonomic computing environment
Tohma, Y
2002 PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2002, : 3 - 6
[4] A fault tolerance service for QoS in grid computing
Lee, HM
Chung, KS
Jin, SH
Lee, DW
Lee, WG
Jung, SY
Yu, HC
COMPUTATIONAL SCIENCE - ICCS 2003, PT III, PROCEEDINGS, 2003, 2659 : 286 - 296
[5] UNICORE: A grid computing environment for distributed and parallel computing
Huber, V
PARALLEL COMPUTING TECHNOLOGIES, 2001, 2127 : 258 - 265
[6] Parallel computing using Web servers and "servlets"
Lo, A
Bloor, C
Choi, YK
INTERNET RESEARCH-ELECTRONIC NETWORKING APPLICATIONS AND POLICY, 2000, 10 (02): : 160 - 169
[7] A service-based hierarchical architecture for parallel computing in grid environment
Tong, WQ
Ding, JB
Tang, JQ
Wang, B
Cai, LZ
GRID AND COOPERATIVE COMPUTING, PT 1, 2004, 3032 : 641 - 644
[8] Resubmission based fault tolerance approach to Schedule Jobs in GRID environment
Ahuja R.
Banga A.
EAI Endorsed Transactions on Pervasive Health and Technology, 2019, 6 (24)
[9] A decentralized fault tolerance model based on level of performance for grid environment
Mohammed Rebbah
Yahya Slimani
Abdelkader Benyettou
Lionel Brunie
Cluster Computing, 2016, 19 : 13 - 27
[10] A decentralized fault tolerance model based on level of performance for grid environment
Rebbah, Mohammed
Slimani, Yahya
Benyettou, Abdelkader
Brunie, Lionel
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2016, 19 (01): : 13 - 27

← 1 2 3 4 5 →