Home   Research Publications Members Related Software
IndexBrowse   BibliographiesMy selection
 Search: in   (word length ≥ 3)
      Login
Publication no #691   Download bibtex file Type :   Html | Bib | Both
Add to my selection
Failure Detection and Diagnosis in Architecture-based Autonomic Systems

Paulo Casanova.


PhD thesis, Nr. (CMU-S3D-12-100), Software and Societal Systems Department, School of Computer Science, Carnegie Mellon University, April 2023.

Online links:   Bibtex entry   Plain Text

Abstract
As the size and complexity of modern IT systems increases, there is greater need for automatic recovery from failures. Recently, self-adaptive control loops have started to replace human oversight as means to ensure high availability of software systems. Two critical pieces of the self-adaptive loop for high availability are failure identification and fault localization. Failure identification – figuring out that some- thing is not working – is a challenging activity as (1) the monitoring is not done at the same abstraction level as the failures manifest themselves, and (2) because sys- tems perform several activities concurrently, incorrect behavior will appear mixed with correct behavior. Identifying faults, pinpointing the source of the failure, is also challenging as (1) there may be multiple explanations for a fault and (2) diagnosis must be performed in a useful time frame. In this thesis, we propose to improve self- diagnosis through a framework that allows a system to identify failures and pinpoint the corresponding faulty parts in a running system. This framework is based on two key principles: reasoning about the system’s behavior at the software architecture level and providing a declarative approach to describe system behavior. The use of architectural models allows the diagnostic infrastructure to scale gracefully, supports efficient run-time execution of common fault localization algorithms, and supports failure diagnosis of system-level properties such as end-to-end performance. The use of a declarative approach to behavior allows one to systematically specify rules for bridging the gap between low-level monitoring and higher-level problem detec- tion. It also supports reuse across systems that share a common architectural style or implementation infrastructure.

Keywords: Diagnosis, Self-adaptation.  
    Created: 2024-10-16 11:33:51     Modified: 2024-10-16 11:35:10
Feedback: ABLE Webmaster
Last modified: Sat October 12 2019 16:15:32
        BibAdmin