The remaining of this thesis is structured as follows. Chapter 2 - Literature Review
This chapter situate the current study within the body of literature and provides a representative review of works related to the research topic of this thesis. It includes a general discussion regarding adaptation in the context
of software engineering as well as a review of some of the works more closely related to the topic of dynamic adaptation of service compositions.
Chapter 3 - ProAdapt Adaptation Framework
This chapter introduces the current stage of the adaptation framework de- veloped in this thesis to support dynamic adaptation of service composi- tions. The chapter presents an overview of the main concepts of the frame- work and the architecture created to support these concepts. The chapter also includes details of the process of adaptation and the techniques used to analyse events and predict failures.
Chapter 4 - Experiments and Evaluation
This chapter presents the prototypes developed to verify the concepts, tech- niques, and methods of the adaptation framework. For each of the proto- types, the analysis of the collected data is presented and discussed.
Chapter 5 - Behavioural Compensation Extension
This chapter presents an extension for ProAdapt to support behavioural compensation. It includes a running example with which the extension is explained and also the results and evaluation of the experiment performed. Chapter 6 - Conclusions and Future Work
This chapter highlights the current contributions of the research done hith- erto and presents the future plans for the work.
Chapter 2
Literature Review
Software systems often go through various stages in their development life-cycle in order to meet the defined requirements under the expected operating running environments. However, it is arguably impossible to anticipate the requirements and behaviour of all users or create a single best configuration for the system that works all the time. Moreover, requirements and especially the running en- vironment changes over time, potentially invalidating previous agreements. This nonstatic environment forces software systems to support continuous adaptation in order to avoid unwanted behaviour or loss of opportunities.
There are various works in the literature dealing with adaptation approaches in the context of software systems, however, it is important to note that the focus of this research is adaptation specifically targeted at service composition. Never- theless, some core concepts are common and important for a better understanding of the topic. The remaining of this section starts by introducing adaptation in the general concept of Software Engineering, and progressively address the ap- proaches targeting service compositions.
2.1
Software Engineering Adaptation
The adaptation of software systems has been extensively studied in various branches of the software engineering field, including control engineering [134][44], software architectures [82][47], self-adaptive systems theory [29], autonomic computing [78], and fault-tolerant computing [123][112].
In control engineering, adaptation is concerned with the use of sensors to monitor the devices being controlled and actuators that are able to make cor- rections on the system according to the reasoning performed with the monitored values. Different adaptation techniques, such as gain scheduling [133], automatic tuning[148] and continuous adaptation [129] have been used in controllers for more than twenty years [7].
An example of adaptation in the context of control engineering is the method presented in [129] to reduce the acoustic feedback in hearing aids that contain a substantial amount of gain. The authors use a continuous adaptation approach, together with a delay in the forward or cancellation paths of the hearing aid plant to achieve acoustic feedback reduction of more than 15 dB, increasing the maximum insertion gain of a hearing aid using the approach. The approach is somewhat similar to the work present in this thesis in the way that it constantly adapts the filter coefficients based on the monitored signal, while our approach constantly adapts service compositions based on monitored QoS values.
In a similar way to control engineering, self-adaptive systems are able to au- tonomously evaluate their execution context and change their own behaviour in order to meet or improve previously defined requirements. For example, a soft- ware system can be designed as a multimodal display based on changing context
and user disabilities[131]. In other words, it is possible to select among possible output formats, such as sound, image, and text, in order to present information based on user preferences. A website can self adapt to increase the font size when presenting information for elderly users, or play sounds instead of presenting texts in case of blind users.
In a more ambitious goal, autonomic computing refers to the self-managing computing model named after the human body’s autonomic nervous system. The main overall aim of autonomic computing is the design of systems able to manage themselves given high-level objectives specified by humans while keeping the com- plexity of the system itself invisible to the user. To accomplish these ambitious goals, IBM has proposed a conceptual guideline architectural named MAPE[35]. Figure2.1 illustrate the MAPE model, which is consisted of a central knowledge base surrounded by four modules: Monitor, Analyse, Plan, and Execute.
Figure 2.1: MAPE Architectural Guideline
The monitoring process is responsible to collect information about the man- aged resources and execution context. The Analyse module extract and correlate
information from the monitored data to model adaptation situations. The plan- ning phase uses policies to guide different adaptation procedures. Finally, the planned actions are executed through controlled mechanisms. The MAPE model, however, is purposely only to be used as a guideline. It is up to the system de- signer to decide how each component is going to be implemented and where they are going to be positioned.
Regardless of the software domain, perhaps the main reason that motivates the need for adaptation is the faults that invariably occur in software systems, especially distributed systems. In many application contexts, such as in business process, the reliability of the overall system must be far higher than the reliability of its individual components[80].
Software reliability can be defined as the probability that the software will be functioning without failure under a given environmental condition during a specified period of time[141]. In other words, some systems are designed so that even in the event of faults risen by internal components, the overall system must still attempt to accomplish its goal while meeting the defined requirements.
Observable Fault Monitoring Detected by Undetectable Fault Failure Reasoning Predicts
May lead to May lead to
Figure 2.2: Failure prediction process performed for fault tolerant systems
The failure and fault terms are thus key to the understanding of system reli- ability. A failure is defined as a deviation from the conformance with the system requirements specification for a specified period of time, while faults are failures
incurred by internal component or interacting systems [61].
Figure 2.2 illustrate this relation between faults and failures in the context of fault tolerant systems. Generally speaking, as depicted in the picture, fault- tolerant systems compute the probability of one or more observable faults causing a system failure. A direct interpretation of such a system is that hidden or undetectable faults can easily cause system failures since they are not computed by the reasoning process. If those undetectable faults lead first to observable faults, then the fault tolerant system can work as expected.
By predicting failures, fault-tolerant systems can adapt accordingly to prevent such failures to happen and thus ensuring the system reliability. In the context of service-based system, failures of external partner services are identified as internal faults by the service composition. For example, whenever a service operation deployed in a service operation does not meet its expected QoS aspects, or cannot be invoked for some reason, a fault is observed.
Failure Tracking Failure Prediction Symptom Monitoring Detect Error Reporting Undetected Error Auditing
Figure 2.3: Classification of failure prediction techniques
As previously described, the ability to predict and prevent failures is of great importance to ensure the system reliability. At the same time, the continuous grow in complexity and size of the computer systems make almost mandatory the use of a solid fault management method. The remaining of this section presents an
overview of the four major branches of failure prediction techniques as presented in [126] and depicted in Figure 2.3.
Failure Tracking
Failure tracking is the method used to assert the probability of future failures based on the occurrence of previous failures. The work in [37] presents an offline Bayesian predictive approach which attempts to predict the probability distribu- tion function of future failures given the occurrence of present and past failures. The approach is applied to the Markov process model known as Jelinski-Moranda [97] and besides been a complex offline method can also be used during runtime. Failure Tracking can also be accomplished by exploiting the fact that failures can usually occur next to each other, either in time or in space. An example of work that examines such relation is [52], which presents a method for temporal and spatial correlation of failures in distributed system. This approach has some resemblance to our technique to predict faults in operations, service and providers (see Section 3.3.1.2).
Symptom Monitoring
Symptoms of a software system can be seen as side effect of errors. Before a failure can be observed in a system, there are some symptoms that can be detected and exploited to predict, prevent or at least mitigate these failures. In other words, failure-prediction techniques based on symptom monitoring analyse the system behaviour and context in order to detect symptoms that can be perceived as prelude of upcoming failures.
An example of symptom monitoring is presented in [88] through the form of a function approximation. The work attempts to predict the resource utilization of Apache web servers by collecting system variables such as memory use and then
building an auto-regressive model to predict the time of resource exhaustion. The work in [104] uses instead a technique known as time series prediction. Generally speaking, monitored values are seen as sequence of data points in time. Time series forecasting is performed by modelling previous measured values to compute future values. In the case of the work in [104], the system variable is the memory usage of J2EE applications, which is monitored every twelve minutes. Future memory usage and resource exhaustion is predicted by combine rough set theory with wavelet networks to the collected data.
The previous approach retains some similarities to our own regarding the pre- diction of data values using time series forecast. As described in Section 3.3.1.1, we use exponentially weighted moving average to predict the expected service operation response time.
Detected Error Reporting
In some cases, collecting and reasoning about system symptoms promptly is not practical or indicated. In such cases, failure prediction can work using as input some sort of discrete system error report. More precisely, failure prediction using detected error reporting can be generalised as the reasoning function f (rn) = P(n + k), where rn is the error report given at time tn and P(n + k) is the prediction of what may happen at time t(n+ k).
This approach is employed in [84] to generate failure warning in high per- formance Linux clusters. The authors extended the Open Source Cluster Ap- plication Resources (OSCA) by introducing high availability (HA-OSCA). The proposed method collects and aggregates hardware sensitive information, such as temperature, voltage and power supply, and uses thresholds to predict and prevent failures.
Undetected Error Auditing
In the previous branches of failure prediction, the approaches focus on analysing an event that is observed or logged as a direct result of the use of a particular function of the system. In the case of undetected error auditing approaches, the idea is to proactively look for incorrect or undesired system states, even for those parts that are not actually in use at the moment.
This search for undetected errors, or latent failure is exploited in [9]. The authors propose a method to detect and predict latent sector errors in disk drive- based storage systems. The contribution of the paper is in the fact that such latent errors can be detected before the corresponding disk sectors are accessed and thus, potentially avoiding system failures.
This section discusses topics and works in the general context of software engineering for self-adaptable systems. Many concepts have been used in the ProAdapt framework, such as the precise definition of faults and failures, and the control loop of the MAPE architectural model. Moreover, two important concepts of the ProAdapt framework were designed as the result of the research done, reviewing general techniques for failure prediction. The spatial correlation of availability and the function approximation of the response time were both envisioned by adapting some ideas reviewed as general adaptation and failure prediction. The next section closes the gap by targeting specifically on adaptation approaches designed to work with service compositions.