Automated Analysis and Optimization of Distributed Self-Stabilizing Algorithms
[摘要] Self-stabilization [2] is a versatile technique for recovery from erroneous behavior due to transientfaults or wrong initialization. A system is self-stabilizing if (1) starting from an arbitraryinitial state it can automatically reach a set of legitimate states in a finite number of steps and (2)it remains in legitimate states in the absence of faults. Weak-stabilization [3] and probabilistic-stabilization[4] were later introduced in the literature to deal with resource consumption ofself-stabilizing algorithms and impossibility results. Since the system perturbed by fault maydeviate from correct behavior for a finite amount of time, it is paramount to minimize this timeas much as possible, especially in the domain of robotics and networking. This type of faulttolerance is called non-masking because the faulty behavior is not completely masked from theuser [1].Designing correct stabilizing algorithms can be tedious. Designing such algorithms thatsatisfy certain average recovery time constraints (e.g., for performance guarantees) adds furthercomplications to this process. Therefore, developing an automatic technique that takes as inputthe specification of the desired system, and synthesizes as output a stabilizing algorithm withminimum (or other upper bound) average recovery time is useful and challenging. In this thesis,our main focus is on designing automated techniques to optimize the average recovery time ofstabilizing systems using model checking and synthesis techniques.First, we prove that synthesizing weak-stabilizing distributed programs from scratch and repairingstabilizing algorithms with average recovery time constraints are NP-complete in thestate-space of the program. To cope with this complexity, we propose a polynomial-time heuristicthat compared to existing stabilizing algorithms, provides lower average recovery time formany of our case studies.Second, we study the problem of fine tuning of probabilistic-stabilizing systems to improvetheir performance. We take advantage of the two properties of self-stabilizing algorithms tomodel them as absorbing discrete-time Markov chains. This will reduce the computation ofaverage recovery time to finding the weighted sum of elements in the inverse of a matrix.Finally, we study the impact of scheduling policies on recovery time of stabilizing systems.We, in particular, propose a method to augment self-stabilizing programs with k-central and k-boundedschedulers to study dierent factors, such as geographical distance of processes and theachievable level of parallelism.
[发布日期] [发布机构] University of Waterloo
[效力级别] [学科分类]
[关键词] Computer Science [时效性]