Academic Journals Database
Disseminating quality controlled scientific knowledge

Fault Tolerance In Grid Computing: State of the Art and Open Issues

ADD TO MY LIST
 
Author(s): Ritu Garg | Awadhesh Kumar Singh

Journal: International Journal of Computer Science and Engineering Survey
ISSN 0976-3252

Volume: 2;
Issue: 1;
Start page: 88;
Date: 2011;
VIEW PDF   PDF DOWNLOAD PDF   Download PDF Original page

Keywords: Grid Computing | Fault Tolerance | Workflow Grid

ABSTRACT
Fault tolerance is an important property for large scale computational grid systems, wheregeographically distributed nodes co-operate to execute a task. In order to achieve high level of reliabilityand availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resourcesaffects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in gridcomputing. Commonly utilized techniques for providing fault tolerance are job checkpointing andreplication. Both techniques mitigate the amount of work lost due to changing system availability but canintroduce significant runtime overhead. The latter largely depends on the length of checkpointing intervaland the chosen number of replicas, respectively. In case of complex scientific workflows where tasks canexecute in well defined order reliability is another biggest challenge because of the unreliable nature ofthe grid resources.

Tango Rapperswil
Tango Rapperswil

     Save time & money - Smart Internet Solutions