Counter abstraction is a powerful tool for parameterized model checking, if the number of local states of the concurrent processes is relatively small. In recent work, we introduced parametric ...interval counter abstraction that allowed us to verify the safety and liveness of threshold-based fault-tolerant distributed algorithms (FTDA). Due to state space explosion, applying this technique to distributed algorithms with hundreds of local states is challenging for state-of-the-art model checkers. In this paper, we demonstrate that reachability properties of FTDAs can be verified by bounded model checking. To ensure completeness, we need an upper bound on the distance between states. We show that the diameters of accelerated counter systems of FTDAs, and of their counter abstractions, have a quadratic upper bound in the number of local transitions. Our experiments show that the resulting bounds are sufficiently small to use bounded model checking for parameterized verification of reachability properties of several FTDAs, some of which have not been automatically verified before.
This article addresses novel real-valued consensus problems in the presence of malicious adversaries that can move within the network and induce faulty behaviors in the attacked agents. By adopting ...several mobile adversary models from the computer science literature, we develop protocols which can mitigate the influence of such malicious agents. The algorithms follow the class of mean subsequence reduced (MSR) algorithms, under which agents ignore the suspicious values received from neighbors during their state updates. Different from the static adversary models, even after the adversaries move away, the infected agents may remain faulty in their values, whose effects must be taken into account. We develop conditions on the network structures for both the complete and non-complete directed graph cases, under which the proposed algorithms are guaranteed to attain resilient consensus. The tolerance bound for network conditions becomes more strict as the adversaries are allowed to have more power. Extensive simulations are carried out over random graphs to verify the effectiveness of our approach when the information of the adversarial agents in terms of their models and numbers is unknown to the agents.
In this paper, we study a distributed approach based on consensus algorithms for clock synchronization in wireless sensor networks. The sensor nodes face two types of uncertainties. One is that some ...of the nodes in the network can be faulty and transmit arbitrary signals by not following the given protocol; similar effects may be caused by false data injection by an external malicious attacker. The other is that the communication is unreliable and the packets exchanged may become lost. To deal with these uncertainties, we propose a resilient consensus-type algorithm based on the so-called mean subsequence reduced (MSR) technique, where each normal node ignores the outliers in the clock data collected from its neighbors and makes updates using data from the past if new data have not arrived yet. We establish network connectivity conditions in terms of graph robustness for the MSR algorithm to attain resilient properties.
Fault-tolerant algorithms, such as Reliable Broadcast , assure the correct operation of modern distributed systems, even when some of the system nodes fail. However, the development of distributed ...algorithms is a manual and complex process, where slight changes in requirements can require a complete redesign of the algorithm. To automate the process of developing such algorithms, this work presents a new approach that uses Reinforcement Learning to synthesize correct and efficient fault-tolerant distributed algorithms. This work shows the first application of the approach on the synthesis of fault-tolerant Reliable Broadcast algorithms. The presented technique is capable of synthesizing correct and efficient algorithms with the same performance as others available in the literature as well as a new Byzantine tolerant algorithm, in only 12, 000 learning episodes. Based on the success of this implementation, we aim, in the future, to extend this technique to other distributed algorithms such as Consensus.
In this paper, we develop distributed algorithms for achieving resilient consensus via the maximum value-based approach when adversarial agents may be present in the network. The adversaries intend ...to prevent the nonfaulty, normal agents from reaching consensus. We extend the class of resilient methods known as the mean subsequence reduced (MSR) algorithms, where the agents make selections on their neighbours' information at the time of their updates so as to reduce the influence of the malicious agents. In particular, the normal agents try to arrive at the maximum value of their initial states. Due to the malicious agents, the exact maximum may not be reached, the advantage of the approach is the finite-time convergence. We present both synchronous and asynchronous versions of the update schemes and characterize graph theoretic conditions for achieving resilient consensus. A numerical example is provided to illustrate our results.
This paper addresses novel consensus problems for multi-agent systems operating in an unreliable environment where adversaries are spreading. The dynamics of the adversarial spreading processes ...follows the susceptible-infected-recovered (SIR) model, where the infection induces faulty behaviors in the agents and affects their state values. Such a problem setting serves as a model of opinion dynamics in social networks where consensus is to be formed at the time of pandemic and infected individuals may deviate from their true opinions. To ensure resilient consensus among the noninfectious agents, the difficulty is that the number of infectious agents changes over time. We assume that a local policy maker announces the local level of infection in real-time, which can be adopted by the agent for its preventative measures. It is demonstrated that this problem can be formulated as resilient consensus in the presence of the socalled mobile malicious models, where the mean subsequence reduced (MSR) algorithms are known to be effective. We characterize sufficient conditions on the network structures for different policies regarding the announced infection levels and the strength of the epidemic. Numerical simulations are carried out for random graphs to verify the effectiveness of our approach.
This paper presents a new message-passing algorithm, called Do-UM, for distributed cooperative task computing in synchronous settings where processors may crash, and where any multicasts (or ...broadcasts) performed by crashing processors are unreliable. We specify the algorithm, prove its correctness and analyse its complexity. We show that its worst case available processor steps is S=Θt+nlognloglogn+f(n−f) and that the number of messages sent is less than n2t+nf2, where n is the number of processors, t is the number of tasks to be executed and f is the number of failures. To assess the performance of the algorithm in practical scenarios, we perform an experimental evaluation on a planetary-scale distributed platform. This also allows us to compare our algorithm with the currently best algorithm that is, however, explicitly designed to use reliable multicast; the results suggest that our algorithm does not lose much efficiency in order to cope with unreliable multicast.
•Do-UM: A new fault-tolerant message-passing algorithm for distributed cooperative computing.•Operates efficiently despite processor crashes and unreliable multicast.•Experimental performance similar to existing algorithms that use reliable multicast.
Agreement problems and their solutions are essential to fault-tolerant distributed computing. Over the years, different assumptions on failures have been considered, but most of these assumptions ...were focusing on either processes or links. In contrast, we examine a model where both links and processes can fail. In this model we devise a unified lower bound for resilience to both classes of faults. We show that the bound is tight by devising a simple retransmission scheme that allows optimally resilient algorithms to be constructed from well known algorithms by transparently adding link-fault tolerance. Our results show that when considering multiple independent failure modes, resilience bounds are not necessarily additive.
We provide a suite of impossibility results and lower bounds for the required number of processes and rounds for synchronous consensus under transient link failures. Our results show that consensus ...can be solved even in the presence of $O(n^2)$ moving omission and/or arbitrary link failures per round, provided that both the number of affected outgoing and incoming links of every process is bounded. Providing a step further toward the weakest conditions under which consensus is solvable, our findings are applicable to a variety of dynamic phenomena such as transient communication failures and end-to-end delay variations. We also prove that our model surpasses alternative link failure modeling approaches in terms of assumption coverage.