Fault tolerance using "Parallel shadow image servers (PSIS)" in grid based computing environment

被引:3
|
作者
Hussain, Naveed [1 ]
Ansari, M. A.
Yasin, M. M.
Rauf, Abdul
Haider, Sajjad
机构
[1] Natl Univ Modern Languages, Dept Informat Technol, Islamabad, Pakistan
[2] Fed Urdu Univ Arts Sci & Technol, Dept Comp Sci, Islamabad, Pakistan
[3] COMSATS Inst Informat Technol, Dept Comp Sci, Islamabad, Pakistan
关键词
grid computing; fault tolerance; PSIS; condor; cactus; job scheduling;
D O I
10.1109/ICET.2006.335982
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper will present a critical review, of the existing fault tolerance mechanism in grid computing and the overhead involved in terms of reprocessing or rescheduling of jobs, if in case a fault arisen For this purpose we suggested the Parallel Shadow Image Server (PSIS) copying techniques in parallel to the Resource Manager for having the check points for rescheduling of jobs from the nearest flag, if in case the fault is detected. The job process is to be scheduled from the resource manager node to the worker nodes and then its' submitted back by the worker nodes in serialized form to the Parallel Shadow Image Servers from the worker nodes after the pre-specified amount of time, which we call the recent spawn or the flag check point for rescheduling or reprocessing of job. If the fault is arisen then the rescheduling will be done from the recent check point and will be submitted to the worker rode from where the job was terminated. This will not only save time but will improve the performance up to major extent.
引用
收藏
页码:703 / 707
页数:5
相关论文
共 50 条
  • [21] A Replication Strategy for Fault Tolerance in Data Grid Environment
    Li, Jing
    ACC 2009: ETP/IITA WORLD CONGRESS IN APPLIED COMPUTING, COMPUTER SCIENCE, AND COMPUTER ENGINEERING, 2009, : 363 - 366
  • [22] Nomadic migration: Fault tolerance in a disruptive grid environment
    Lanfermann, G
    Allen, G
    Radke, T
    Seidel, E
    CCGRID 2002: 2ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, 2002, : 280 - 281
  • [23] Fault tolerance in cloud computing environment: A systematic survey
    Hasan, Moin
    Goraya, Singh
    COMPUTERS IN INDUSTRY, 2018, 99 : 156 - 172
  • [24] Failover strategy for fault tolerance in cloud computing environment
    Mohammed, Bashir
    Kiran, Mariam
    Maiyama, Kabiru M.
    Kamala, Mumtaz M.
    Awan, Irfan-Ullah
    SOFTWARE-PRACTICE & EXPERIENCE, 2017, 47 (09): : 1243 - 1274
  • [25] FAULT TOLERANCE TASK EXECUTION THROUGH COOPERATIVE COMPUTING IN GRID
    Goraya, Major Singh
    Kaur, Lakhwinder
    PARALLEL PROCESSING LETTERS, 2013, 23 (01)
  • [26] Adaptive Checkpointing for Fault Tolerance in an Autonomous Mobile Computing Grid
    Jaggi, Parmeet Kaur
    Singh, Awadhesh Kumar
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 553 - 557
  • [27] Performance evaluation of fault tolerance techniques in grid computing system
    Khan, Fiaz Gul
    Qureshi, Kalim
    Nazir, Babar
    COMPUTERS & ELECTRICAL ENGINEERING, 2010, 36 (06) : 1110 - 1122
  • [28] Wide and fault diameter in Kneser graphs for enhanced fault tolerance in parallel computing
    Sundara Rajan, R.
    Kirithiga Nandini, G.
    Lin, Yuqing
    Reji, Remi Mariam
    International Journal of Networking and Virtual Organisations, 2024, 31 (03) : 169 - 190
  • [29] Semantic Image Retrieval in a Grid Computing Environment Using Support Vector Machines
    Irtaza, Aun
    Jaffar, M. Arfan
    Mahmood, Muhammad Tariq
    COMPUTER JOURNAL, 2014, 57 (02): : 205 - 216
  • [30] A parallel and fault tolerant file system based on NFS servers
    García, F
    Calderón, A
    Carretero, J
    Pérez, JM
    Fernández, J
    ELEVENTH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2003, : 83 - 90