• No results found

Study under Real Workload Google Cluster Dataset

5.7 Empirical Study of Self-aware Cloud Architecture

5.7.4 Study under Real Workload Google Cluster Dataset

This section evaluates the reputation-aware and non-reputation-aware posted offer mech- anisms using the Google cluster dataset [56]. The Google cluster dataset provides traces over a 7 hour period. Each task in the workload belongs to a single job. Hence, we focus on the allocation of resources at the task-level. The distribution of jobs over the workload period is shown in figure 5.13. Job Type (0, 1, 2) is used as a categorization of work i.e. SLA classes corresponding to Low, Medium, and High respectively.

Experimental Setting

WorkloadData, is a real workload defined by jobs as distributed in figure 5.13. At each time step, N is set to the number of jobs, jb, at that time step in figure 5.13. Each of the N jobs at a time step is assigned to one of the N buyer agents, B. M = 52800, to account for the highest workload in the dataset. ResilienceMode is defined for each case (low or high) as defined in figure 5.5. Number of trading round, NumTradingRnd, is set to 76 as defined by WorkloadData (figure 5.13). The priority of each job is defined by its SLA class in the Google Cluster workload dataset. Each seller, S, has its resource provisioning

capacity defined by values defined in table 5.4.

Results

The results for low and high resilience cases under real workload are shown in figure 5.14.

Figure 5.14: Sensitivity to Failure - Google Cluster Dataset

In the high resilience case, RBH performed better than RTS, NRBH, and NRTS, having a percentage success in the region of 100-90%. NRBH and NRTS had percentage success in the region of 95-90% and 85-80% respectively. RTS had percentage success in the region of 100-95% but was worse off than RBH, NRBH, and NRTS in the first (approximately) 300 time steps, however, its success rate improved afterward converging to zero failed nodes. The pattern of failure in the low resilience case is comparable to the high resilience case, however, the behaviour of RTS in the low resilience scenario resembles a step function. That is, consistent small changes in number of failed nodes were recorded at short intervals, followed by a sudden jump (improvement) to much lower number of failed nodes. Figure 5.15 shows the recorded SLA compliance.

Figure 5.15: SLA Compliance for Bargain Hunter and Time Savers when using Reputation-aware and Non-Reputation-aware Mechanism under real workload

# S Oscillatory frequency Transition Time steps

50 1 at 11350

50 5 Every 3783 ticks

50 10 Every 2063 ticks

Table 5.8: Schedule for seller nodes to change their resilience levels. Note: in accordance with the Google Cluster Data, the last time step is 22700, hence, transition time steps are evenly distributed across the simulation life time.

Measuring Impact of Fluctuating Cloud Service Resilience

Up until now, the notion of seller agent resilience is initialised at the start of the simulation run, and fixed throughout the simulation. According to [117], the dynamics of real cloud data centres necessitates an approach that is able to manage transitions across several resilience levels. We define such a transition as the Oscillatory Frequency of the node’s resilience. This is the number of times a seller node makes a transition from one resilience level to another. We do not differentiate between cyclic transitions at this point, i.e., the case where the seller node returns to its initial resilience level after a number of successive transitions.

high and low resilience, and secondly set the number of transitions (oscillations) and the time steps when these transitions will occur. An example of the enriched failure model for a population of sellers is shown in table 5.8.

The seller population in table 5.8 is split evenly between the two resilience levels, therefore, there are 50 seller nodes, of which 25 are high resilient and the other 25 are low resilient. At the transition time step, a seller node changes to the opposite resilience level, i.e., a high resilience seller node changes to low, and vice versa. Figure 5.16 shows the impact of the oscillatory frequencies 1, 5, and 10 on the number of failed nodes recorded.

Results

Figure 5.16: Sensitivity to Failure for Oscillatory Frequencies 1, 5, and 10

In the case of one oscillatory frequency RBH recorded significantly smaller number of failed nodes when compared to RTS prior to the transition time step (time tick 11350). However, after the transition time step, the behaviour of RBH and RTS are comparable, as they both record less than 20 failures. NRBH record less than 100 failures in the prior to the transition time step but failure increased afterwards, peaking at approximately 200 failures. NRTS recorded the most failure prior to the transition but significantly improved after the transition, peaking at approximately 100 failures.

In the case of five oscillatory frequency case, RBH and RTS behaved in a way similar to the one oscillatory frequency case up to the first transition time step (time tick 3783). After the first transition step, RBH recorded consistently low number of failure (< 25) for

the rest of the simulation run. The number of failures recorded by RTS declined abruptly after that first transition time step (only one failed node at time step 4000). This number peaks at 12 for the first transition period. Thereafter, the number increases up to 49 failed nodes in the second transition period. The desired minimal number of failed nodes only occurs at the third transition period (after 11349 time ticks), where the number of failed nodes peak at 3. This minimal failed nodes behaviour is sustained in transition periods 4 and 5. On the other hand, NRBH and NRTS are observed to behaved similarly to the one oscillatory frequency case, by oscillating between success and failure across alternate transitions.

The behaviour of RBH, RTS, NRBH, and NRTS in the ten oscillatory frequency case are consistent with those observed in the five oscillatory frequency case as shown in figure 5.16. The achieved SLA compliance across the three oscillatory frequencies (1,5, and 10) is depicted in figure 5.17.