A new method for transparent fault tolerance of distributed programs on a network of workstations using alternative schedules

Das, D. ; Dasgupta, P. ; Das, P. P. (1997) A new method for transparent fault tolerance of distributed programs on a network of workstations using alternative schedules In: 3rd International Conference on Algorithms and Architectures for Parallel Processing, 1997. ICAPP 97, 12 December 1997, Melbourne, Victoria, Australia.

Full text not available from this repository.

Official URL: http://ieeexplore.ieee.org/document/651515/

Related URL: http://dx.doi.org/10.1109/ICAPP.1997.651515

Abstract

In this paper, we devise a new method for transparent fault tolerance of distributed programs running on a cluster of networked workstations. We use the concept of alternative schedules for this purpose. Such schedules are generated from static task graphs at compile-time. At run-time a distributed program can use these alternatives to switch from one schedule to another if some machine/s become faulty. We have devised fast but efficient mechanisms for switching among schedules at run-time. This enables fault recovery from any number of simultaneous machine faults any number of times. The correctness of the resultant algorithm is ensured through prevention of direct data sharing among local tasks on a machine. Such a transparent fault tolerant strategy is easily implementable on a network of workstations running PVM-like softwares.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to Institute of Electrical and Electronics Engineers.
ID Code:101739
Deposited On:09 Mar 2018 10:18
Last Modified:09 Mar 2018 10:18

Repository Staff Only: item control page