The probability for errors to occur in electronic systems is not known in advance, but depends on many factors including influence from the environment where the system operates. In this paper, it is ...demonstrated that inaccurate estimates of the error probability lead to loss of performance in a well known fault tolerance technique, Roll-back Recovery with checkpointing (RRC). To regain the lost performance, a method for estimating the error probability along with an adjustment technique are proposed. Using a simulator tool that has been developed to enable experimentation, the proposed method is evaluated and the results show that the proposed method provides useful estimates of the error probability leading to near-optimal performance of the RRC fault-tolerant technique.
Malaria is one of the most devastating diseases in the context of emerging and re-emerging infectious diseases. We have to be fully prepared for the introduction of the disease into Japan under ...recently enforced“the new law for the prevention of infectious diseases”as it is called. However, prompt diagnosis and proper treatment are not always to be provided in Japan, because of our relatively poor awareness of its importance and inappropriate supply of effective anti-malaria drugs. Travel medicine is rarely practiced though it is indispensable as internationalization or globalization of our society progresses. It is also quite important for us to contribute to the control of malaria in the tropical countries. The former prime minister Hashimoto pointed out its importance in the G8 summit in Denver, 1997, and then Japanese Government started to promote Global Parasite Control Strategy for the 21 century or“Hashimoto Initiative”. Another initiative suggested by WHO as “Roll Back Malaria” is also a global malaria control program, to which Japanese Government should contribute. Now requirement for the investment into the basic and applied research of malaria is recognized, and the budget will be increasing in the area of Grant-in-aid for scientific research of the Ministry of Education, Science, Sports and Culture, and the Ministry of Health and Welfare of Japan.
In a persistent object store, the acts of modifying data and reading modified data result in the creation of dependencies between the modifying process and the data. Dependencies may be represented ...using sets, and over time these may grow to encompass many objects and processes. Checkpoint and roll-back operations must propagate to all elements in such a set. This paper presents a new notation for representing dependencies, and shows that differentiating between the dependencies created by modifying data and reading modified data reduces the extent of propagation of checkpoint and roll-back operations.< >
Checkpointing Systems Wolter, Katinka
Stochastic Models for Fault Tolerance,
2010
Book Chapter
Checkpointing applies to large software systems subject to failures. In the absence of failures the software system continuously serves requests, performs transactions, or executes long-running batch ...processes. If the execution time of the task and the time at which processing starts is known, then the moment of completion of the task is known as well. If failures can happen the completion of a task severely depends on the underling fault model. The typical fault model employed in checkpointing consists in the assumption that faults are detected immediately as they happen. This implies that only crash-faults are considered and no transient or Byzantine faults that would require fault-detection mechanisms. Some checkpointing models assume that faults are detected only at the end of the software module 152.
A model of remote procedure call (RPC) which reflects certain generic properties of the application layer that can be exploited by the RPC layer during failure recovery is presented. A technique of ...adopting orphans caused by failures, which is based on the model, is described. The technique minimizes the rollback which may be required in orphan-killing techniques. Algorithmic details of the adoption technique are described, and a quantitative analysis is presented. The model is implemented as a prototype on a local area network. The simplicity and generality of the failure recovery renders the RPC model useful in distributed systems, particularly those that are large and heterogeneous and hence have complex failure modes.< >
A common assumption in the existing rollback techniques is that transients, the cause of most failures, subside very quickly, implying that a single story retry of the program from the previous ...rollback point is sufficient. The authors discuss a general rollback strategy with n ( n ≥2) retries which takes into consideration multiple transient failures as well as transients of long duration. Ways of deriving practical values of n for a given program are also discussed. Furthermore, the authors propose the use of a watchdog processor as an error detection tool to initiate recovery action through rollback, since the watchdog processor offers low error latency. They also discuss the merging of the watchdog processor with rollback recovery technique for enhancing the overall system reliability.