Architecture-based Run-time Fault Diagnosis
Paulo Casanova,
Bradley Schmerl,
David Garlan and Rui Abreu.
In Proceedings of the 5th European Conference on Software Architecture, 13-16 September 2011.
Online links: Plain Text
Abstract
An important step in achieving robustness to run-time faults is the ability
to detect and repair problems when they arise in a running system. Effective
fault detection and repair could be greatly enhanced by run-time fault diagnosis
and localization, since it would allow the repair mechanisms to focus adaptation
effort on the parts most in need of attention. In this paper we describe
an approach to run-time fault diagnosis that combines architectural models with
spectrum-based reasoning for multiple fault localization. Spectrum-based reasoning
is a lightweight technique that takes a form of trace abstraction and produces
a list (ordered by probability) of likely fault candidates. We show how this technique
can be combined with architectural models to support run-time diagnosis
that can (a) scale to modern distributed software systems; (b) accommodate the
use of black-box components and proprietary infrastructure for which one has
neither a specification nor source code; and (c) handle inherent uncertainty about
the probable cause of a problem even in the face of transient faults and faults that
arise only when certain combinations of system components interact. |
Keywords: Rainbow, Self-adaptation.
|
|