Applying Autonomic Diagnosis at Samsung Electronics
Paulo Casanova,
Bradley Schmerl,
David Garlan, Rui Abreu and Jungsik Ahn.
Technical report, CMU-ISR-13-111, Institute for Software Research, Carnegie Mellon University, September 2013.
Online links:
Abstract
An increasingly essential aspect of many critical software
systems is the ability to quickly diagnose and locate
faults so that appropriate corrective measures can
be taken. Large, complex software systems fail unpredictably
and pinpointing the source of the failure is a
challenging task. In this paper we explore how our recently
developed technique for automatic diagnosis performs
in the automatic detection of failures and fault localization
in a critical manufacturing control system of
Samsung Electronics, where failures can result in large
financial and schedule losses. We show how our approach
can scale to such systems to diagnose intermittent
faults, connectivity problems, protocol violations,
and timing failures. We propose a set of measures of
accuracy and performance that can be used to evaluate
run-time diagnosis. We present lessons learned from this
work including how instrumentation limitations may impair
diagnosis accuracy: without overcoming these, there
is a limit to the kinds of faults that can be detected. |
Keywords: Autonomic Systems, Diagnosis.
@TechReport{2013:Casanova/Samsung,
AUTHOR = {Casanova, Paulo and Schmerl, Bradley and Garlan, David and Abreu, Rui and Ahn, Jungsik},
TITLE = {Applying Autonomic Diagnosis at Samsung Electronics},
YEAR = {2013},
MONTH = {September},
NUMBER = {CMU-ISR-13-111},
INSTITUTION = {Institute for Software Research, Carnegie Mellon University},
PDF = {http://acme.able.cs.cmu.edu/pubs/uploads/pdf/cmu-isr-13-111.pdf},
ABSTRACT = {An increasingly essential aspect of many critical software
systems is the ability to quickly diagnose and locate
faults so that appropriate corrective measures can
be taken. Large, complex software systems fail unpredictably
and pinpointing the source of the failure is a
challenging task. In this paper we explore how our recently
developed technique for automatic diagnosis performs
in the automatic detection of failures and fault localization
in a critical manufacturing control system of
Samsung Electronics, where failures can result in large
financial and schedule losses. We show how our approach
can scale to such systems to diagnose intermittent
faults, connectivity problems, protocol violations,
and timing failures. We propose a set of measures of
accuracy and performance that can be used to evaluate
run-time diagnosis. We present lessons learned from this
work including how instrumentation limitations may impair
diagnosis accuracy: without overcoming these, there
is a limit to the kinds of faults that can be detected.},
KEYWORDS = {Autonomic Systems, Diagnosis} }
|