Diagnosing architectural run-time failures

Paulo Casanova, David Garlan, Bradley Schmerl and Rui Abreu.

In Proceedings of the 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, 20-21 May 2013. Received SEAMS 2013 Best Paper Award.

Self-diagnosis is a fundamental capability of selfadaptive systems. In order to recover from faults, systems need to know which part is responsible for the incorrect behavior. In previous work we showed how to apply a design-time diagnosis technique at run time to identify faults at the architectural level of a system. Our contributions address three major shortcomings of our previous work: 1) we present an expressive, hierarchical language to describe system behavior that can be used to diagnose when a system is behaving different to expectation; the hierarchical language facilitates mapping low level system events to architecture level events; 2) we provide an automatic way to determine how much data to collect before an accurate diagnosis can produced; and 3) we develop a technique that allows the detection of correlated faults between components. Our results are validated experimentally by injecting several failures in a system and accurately diagnosing them using our algorithm.

Keywords: Diagnosis, Self-adaptation.  
