Independent checkpointing and recovery scheme for fail-slow processors /

Abstract: "Consider a system of communicating processes in a distributed environment. Checkpointing is essential in such systems to achieve recovery after a failure. In most research on checkpointing and recovery, it has been assumed that either the processor automatically halts in response...

Full description

Bibliographic Details
Main Author: Krishna, P., 1969-
Other Authors: Vaidya, Nitin H. (Nitin Hemant), 1965-, Pradhan, Dhiraj K.
Format: Book
Language:English
Published: College Station, Tex. : Texas A & M University, Computer Science Dept., [1993]
Series:Technical report (Texas A & M University. Computer Science Department) ; 93-028.
Subjects:
Description
Summary:Abstract: "Consider a system of communicating processes in a distributed environment. Checkpointing is essential in such systems to achieve recovery after a failure. In most research on checkpointing and recovery, it has been assumed that either the processor automatically halts in response to any internal failure and does so before the effects of that failure are visible (fail-stop), or, that a faulty processor is malicious (Byzantine). This technical report presents a new independent checkpointing and recovery algorithm for a distributed system of fail-slow processors. A fail-slow processor has a finite but bounded error detection latency.
Unlike previous schemes that either assume fail-stop processors or ignore the damage caused by imperfect error detection mechanisms, our scheme tolerates bounded error detection latencies. The proposed algorithm is based on independent checkpointing and message logging."
Item Description:"May 12, 1993."
Physical Description:20 leaves : illustrations ; 28 cm.
Bibliography:Includes bibliographical references.