Analysis Results - Analysis Techniques - The 7U Evaluation Method: Evaluating Software Systems

4.3 Analysis Techniques

4.3.6 Analysis Results

Table4.6summarizes the analysis results for the microrebootable application server RAS model shown in Figure4.4based on the model parameters in Table4.2.

Measure Metrics Results

Reliability

Failure escalations per day (Fa→b) 22.671386

Frequency of recovery activities per day (F2) 4.14 mins

Frequency of outages per day (F3) 0.056537

Frequency of recovery actions per day > 1 sec (F4) 70.578179

Availability

Basic steady-state availability (S Savail(admin)) 0.997127

Tolerance availability (S Savail(client)) 0.999883

Capacity-oriented availability (S Savail(capacity)) 0.993716

Serviceability

Fault/failure coverage 99.74%

Mean-time to system restoration (MTTSR) 576 msecs Expected downtime penalties per day (4 9’s) (0.17 - 0.1440)*$p Table 4.6: Summary of Microreboot RAS model analysis results

4.4 Related Work

The analytical tools – Continuous Time Markov Chains (CTMCs), Markov Reward Networks and Feedback Control models – and techniques for their analysis have been well studied and used by others to study many aspects of computing system behavior.

[69], [101] and [121] provide a rigorous discussion of the mathematical principles (proba- bility theory and queuing theory) underlying Markov chains and Markov reward networks as well as techniques for their analysis and solution. [69] and [101] specifically provide numerous examples of applying Markov chains to study the performance, reliability and availability characteristics of computing systems.

[98] and [2] discuss techniques for the computationally tractable analysis and solution of Markov models. These techniques are available in the SHARPE [160] RAS modeling tool, which we use in the construction and analysis of RAS models.

Markov chains have been used in the study and analysis of dependable and fault-tolerant systems and the techniques used to realize them. Examples include analyses of RAID (Redundant Arrays of Inexpensive Disks) [114] and telecommunication systems [101]. They have also been used in the study of software aging [13] and in evaluating the efficacy of preventative maintenance (software rejuvenation). Dependability is concerned with assessing the ability of a system to deliver its intended level of service to its users especially in the presence of failures which impinge on its level of service [107]. There are three of the dependability measures of interest: reliability measures, availability measures and task completion measures [72] – task-completion is the likelihood that a task will be completed satisfactorily. [72] also discusses four types of analyses that can be performed – model evaluation, sensitivity analysis, tradeoff analysis and specification determination. In our construction and analysis of RAS models we employ select reliability and availability measures. Further, whereas sensitivity analysis, tradeoff analysis and specification determination are discussed in [72] with regard to Markov chains, these types of analyses can also be applied to models constructed using other modeling formalisms. As a result, we can employ these analyses as part of the RAS evaluation process.

Performability [120] provides unified measures for considering the performance and reliability of systems together. Markov reward networks [69] have been used as a formalism for establishing this link between the performance of a system and its reliability. Other formalisms used in performability analysis include Stochastic Petri Nets (SPNs) [101] and Stochastic Activity Networks (SANs) [161], which are both built on top of Markov chains. SPNs and SANs allow for more detailed and sophisticated modeling of a system’s operation, e.g., modeling concurrent activities in a system. In our construction of RAS models, we use Markov reward networks to quantify the impacts of failures and/or remediation activities. Further, our goal is to develop simple, reusable models templates that can be used for describing failure scenarios and scoring system responses, rather developing a detailed model of the operation of underlying system being evaluated.

Different classes of failures have been studied using Markov chains including independent failures [101], near-coincident failures [42] and cascading failures [91]. Leveraging these analytical tools in our construction of RAS models allows us to describe failure scenarios that represent these different classes of failures.

Feedback control has been used in the development of adaptive computing systems providing mathematical tools for constructing predictable systems [90]. For example, [139] uses control theory to develop a fluid model for network traffic management. [145] presents a database server that adaptively throttles administrative utilities when necessary to maintain a given level of query performance (an example of disturbance rejection in a control system) while work in [45] describes the use of feedback control to automatically adjust the size of memory pools to balance the resource demands in a database management system (an example of regulatory control/regulation). Finally, [44] presents the principles of feedback control and discusses the implications for realizing self-managing systems which exhibit the desirable properties of stability, accuracy, short settling times and avoiding overshoot (SASO) with respect to the policy-based objectives that govern their operation. In constructing RAS models of systems we are interested primarily in applications of regulatory control and assessing the SASO properties of systems and their failure handling mechanisms.

4.5 Summary

This chapter introduced the analytical tools and techniques used to construct RAS models – Continuous Time Markov Chains, Markov Reward Networks and Feedback Control Models. In §4.2we provide background information on these analytical tools. And §4.3discusses the measures and metrics of reliability, availability and serviceability while providing an example RAS model and analysis of the microrebootable application server described in [20].

Using our analytical example we demonstrate the construction of a basic RAS model that can be used to a) describe failure scenarios used to evaluate a microrebootable application server and b) score/evaluate the application server’s responses to injected faults. In conducting our analysis we identify a number of reliability, availability and serviceability metrics that may be used in system evaluations and illustrate how they are derived and calculated.

Our analysis is complementary to the measurement-based evaluation done in [20], considering other aspects of reliability, availability and serviceability not covered in the original work, e.g., reasoning about the frequency of failure escalations, recovery activities and outages; presenting three perspectives on availability for comparison – basic steady-state availability, tolerance availability and capacity-oriented availability; and finally discussing fault/failure coverage, mean-time to system restoration and estimated downtime penalties for the microrebootable application server.

In the next chapter we combine runtime fault-injection tools, including Kheiron, which was described in Chapter3, with the RAS modeling tools described in this chapter to develop the 7U-Evaluation Benchmark – a model-based and measurement-based reliability, availability and serviceability benchmark for web-application stacks and their components.

The 7U-Evaluation Benchmark

In this chapter we present a methodology for evaluating the RAS characteristics of N-tier web application stacks and their components – the 7U-Evaluation method – and demonstrate its effectiveness via three case studies measuring the RAS properties of different deployments of the TPC-W web-application [119].

The 7U-Evaluation Benchmark is a model-based and measurement-based evaluation approach that combines runtime fault-injection tools, Chapter3, with analytical RAS models, that describe and score specific failure scenarios (Chapter4).

In our experiments we subject different TPC-W deployments to the same failure conditions, develop RAS models to describe and score the failure scenarios, and conduct fault-injection experiments to obtain values for the RAS model parameters. Based on the data collected from the fault-injection experiments, we compute, compare and discuss the RAS metrics for each deployment.

In the sections that follow we discuss the challenges of RAS benchmarking, the design considerations for the 7U-Evaluation Benchmark that address those challenges, present our experimental results, and compare our approach to traditional performance benchmarking approaches as well as similar efforts to benchmark aspects of reliability, availability and ser-

viceability in the fault-tolerant computing, dependable computing and autonomic computing communities.

5.1 Introduction

The importance placed on realizing reliable, highly available and serviceable (easy-to- manage/self-managing) software systems necessitates approaches for evaluating the reliability, availability and serviceability (RAS) characteristics of systems [113,118,102,3] Benchmarks provide a structured way to evaluate systems by allowing interested parties “...to measure well-defined features of a system or component according to an agreed ... set of methods and procedures[113]. In assessing the RAS characteristics of systems, we wish to identify or develop methods and procedures that quantitatively capture: a) the impacts of faults or failures on a system’s reliability, availability and serviceability and b) the efficacy of any remediation mechanisms.

In order to conduct an RAS benchmark, it is necessary to have an environment and tools that allow the system under test to be exposed to failure-provoking stimuli [16]. Direct fault-injection into components of the system under test is the primary technique that enables such an environment [8]. An important part of the RAS evaluation process is to inject faults that exercise any remediation mechanisms that the system under test has or that highlight RAS deficiencies. The determination of which faults meet this criteria depend on a) the system or class of system being evaluated and b) problems that have been observed and/or are currently being studied.

Another important element of an RAS benchmark is the generation of realistic workloads for the system under test. This allows us to study the impact of failures and other stressful conditions (the fault-load) on the typical operation of the system. Generating realistic workloads for systems is a difficult problem; however, RAS evaluations can build upon

existing performance benchmarks, which provide excellent sources of workloads, e.g., benchmarks produced by the Standard Performance Evaluation Corporation (SPEC) [177], the Transaction Processing Performance Council (TPC) [190] and the National Institutes of Standards and Technology (NIST) [140].

The use of performance benchmarks to provide realistic workloads during an RAS evaluation influences the metrics that are collected as well as the way these metrics are used in scoring the system under test. The performance metrics collected can be used to reason about complete outages or degraded modes of operation, which result from injecting faults or inducing failures in the test system. In §4.3.6we illustrate how variations in performance metrics can be used/incorporated to reason about different facets of reliability, availability and serviceability, e.g., basic availability vs. tolerance availability vs. capacity-oriented availability, which take different operating modes of the system into consideration. The metrics used to express these facets of reliability, availability and serviceability and the RAS models used to compute them specify the scoring criteria for an RAS evaluation and describe the failure scenarios that the system under test is subjected to during an evaluation.

In document The 7U Evaluation Method: Evaluating Software Systems via Runtime Fault-Injection and Reliability, Availability and Serviceability (RAS) Metrics and Models (Page 154-160)