Long-term Research Objectives - The 7U Evaluation Method: Evaluating Software Systems via Runti

Our work towards retro-fitting self-management capabilities onto existing/legacy systems presented an interesting set of evaluation challenges. Specifically, to evaluate a self- managing system realized via retro-fitting we must consider:

1. The kinds of self-management capabilities that can be retro-fitted. 2. The approaches and technologies used to retro-fit self-management.

3. The impact of these technologies on specific functional characteristics of the system. 4. The efficacy of the self-management capabilities added to the system, i.e., the impact

of these capabilities on specific non-functional characteristics of the system.

For practical reasons however, these challenges need to be refined. The challenges (as stated above) are overly broad – with respect to the self-management capabilities to be evaluated – and overly restrictive with respect to the class of systems considered – self-managing

systems realized via retro-fitting.

2.4.1 Scoping the Self-Management Capabilities to be Evaluated

Under DASADA – DARPA’s initiative to tackle system complexity and manageability issues – the term self-management was related to the identified principles of Continual Validation and Co-ordination: system monitoring, modeling, dynamic repair and dynamic reconfiguration. However, post-DASADA, the notion of self-management took on a broader context with the advent of Autonomic Computing in 2001 [79].

Autonomic Computing is IBM’s proposal for addressing issues of system automation and system complexity. [79] identifies eight key elements (properties) of autonomic systems, which can be used to classify systems into one (or more) of four distinct classes – self- configuring systems, self-healing systems, self-optimizing systems and self-protecting

systems8.

The following definitions for the four classes of self-* systems are adapted from [100]: • Self-Configuring systems configure themselves automatically in accordance with

high-level policies – representing business-level objectives. When a component is introduced, it will automatically learn about and take into consideration the composi- tion and configuration of the system and incorporate itself seamlessly, while the rest of the system adapts to its presence.

• Self-Healing systems detect, diagnose, and repair localized hardware and software problems.

• Self-Optimizing systems continually seek ways to improve their operation, identifying and seizing opportunities to make themselves more efficient in performance or cost. • Self-Protecting systems will defend the system as a whole against large-scale, corre-

lated problems arising from malicious attacks.

In Autonomic Computing, the goal of self-management “...is to free system administrators from the details of system operation and maintenance...” [100]. As a result, this contem- porary definition of self-management encompasses all four aspects of self-configuration, self-healing, self-optimization and self-protection.

Evaluating self-management capabilities of systems considering all four sub-areas (self- configuration, self-healing, self-optimization and self-protection) is a non-trivial task. As a result, the first step in refining the evaluation challenges outlined at the beginning of Section 2.4is to focus on one of the four sub-areas of self-management.

Our past experience with effecting dynamic reconfigurations and repairs in systems (via KX) provided a suitable foundation for exploring the area of Self-Healing systems. Further, our short term research goals (Section2.3) of developing more flexible, dynamic probe and

effector technologies align nicely with the proposed core sub-areas of self-healing systems research – problem detection, diagnosis and repair.

2.4.2 Expanding the Classes of Systems to be Evaluated

Whereas focusing on the evaluation of self-managing (later refined to self-healing) systems realized via retrofit is specific to evaluating systems enhanced by frameworks like KX, this focus is unnecessarily restrictive for a number of reasons.

First, whether self-healing systems are realized via retrofit or via design, the approaches and techniques used to evaluate the efficacy of their self-healing capabilities are expected to be similar (if not identical) while evaluation approaches and tools concerned with the enabling technologies (runtime retro-fitting tools and technologies vs. design-time tools and technologies) are expected to differ, resulting in a multi-part evaluation process. Therefore, the first step in expanding the classes of systems to be evaluated is to consider self-healing systems regardless of whether they are realized by retrofit or by design.

Second, challenges associated with finding systems, which exhibit all the desired characteristics of self-healing systems, to evaluate or compare against. With a nascent research area such as autonomic computing, it will take some time for a) fully self-healing systems to appear, and b) researchers to determine whether any properties of existing systems can be mapped to the desiderata of self-healing systems [102]. The second step in expanding the classes of systems to be evaluated is to consider partially self-healing systems and/or existing systems re-classified as self-healing systems.

Further, an additional implication of a dearth of self-healing systems, is that the classes of systems to be evaluated may also be expanded to consider non-self-healing systems in order to facilitate comparisons between a non-self-healing/”vanilla” version of a system, Svanilla,

Whereas expanding the classes of systems to be evaluated to include non-self-healing systems may at first seem overly permissive, additional motivation for this decision can be obtained by an examination of the expected benefits of self-healing systems.

Based on desired capabilities of self-healing systems provided in [79] and [100], we can identify and summarize a number of expected benefits including, but not limited to:

• Improved reliability resulting from the system’s ability to automatically detect, diagnose and repair problems.

• High availability from the system’s ability to orchestrate and effect repair activities online/dynamically – perhaps degrading its operation if necessary.

• Improved manageability/serviceability by shifting responsibility for some of the management/administration activities (e.g., problem detection, problem determination and problem resolution) onto the system, thereby reducing the management burden placed on system administrators.

These expected benefits, however, are not exclusive to self-healing systems alone, rather they are desirable characteristics for software systems in general. As a result, a Reliability, Availability and Serviceability evaluation is equally applicable/relevant to self-healing and non-self-healing systems.

In document The 7U Evaluation Method: Evaluating Software Systems via Runtime Fault-Injection and Reliability, Availability and Serviceability (RAS) Metrics and Models (Page 38-41)