Without Prediction module - The Brain Module

8. The Brain Module

8.3.1 Without Prediction module

8.3.1.1 Setup

All the components have been built, wired and tested individually. It is now possible to start testing the auto-scaling application. The main purpose of these tests is to see if the virtual cluster is scaled up and down based on the workload, performance metrics gathered by collectd correspond to the workload and if the load testing tool built earlier can be used on a larger scale.

Since the workload for all of these tests is going to be artificially generated by load testing tool, there is no reason to be using the Predictive Model as there are not any patterns in the workload that could be exploited. Therefore, the Predictive Model is going to be disabled for all of the following tests.

All VMs had 1 virtual CPU core and were allocated 0.5 CPU time and 256 MB of memory. 8.3.1.2 Test 1

The first test case tests if the auto-scaling solution is capable of scaling the Tomcat cluster up and down based on the changes in the workload. The application’s parameters and workload are defined in Table 3 and Table 4 respectively. In early experiments I found that one VM was capable to handle around 35 requests per second before failing to respond under 1 second, so I would expect the cluster to scale to two servers, scale to three servers and then scale down to two servers.

Parameter Value

Interval of response time measurement 120 seconds

Moving average’s window size 5 measurements

Minimum response time 400 milliseconds

Maximum response time: 1000 milliseconds

Table 3: Application's parameters for Test Case 1

Duration Throughput URL

20 minutes 70 /DemoWebsite/performance?n=200

10 minutes 103 /DemoWebsite/performance?n=200

10 minutes 70 /DemoWebsite/performance?n=200

Table 4: Workload for Test Case 1

From Figure 10 this prediction almost holds. For a brief moment there are 4 VMs running, but it quickly goes back to the predicted 3 VMs.

36 8.3.1.3 Test 2

In this test I wanted to show the importance of selecting appropriate response time thresholds. Application’s configuration is in Table 5 and the work load is in Table 6

Parameter Value

Monitored URL /DemoWebsite/performance?n=200

Interval of response time measurement 120 seconds

Moving average’s window size 5 measurements

Minimum response time 800 milliseconds

Maximum response time: 1000 milliseconds

Table 5: Application's configuration for Test Case 2

Duration Throughput URL

70 minutes 69 /DemoWebsite/performance?n=200

Table 6: Workload for Test Case 2

Figure 11 shows that with this configuration one VM is not enough to deal with the requests, i.e. the response time is too long, but with two VMs the workload is too small and one VM is deallocated. This oscillation is very inefficient use of resources, because when there is 1 VM, the response time is not acceptable (greater than the maximum response time, which could correspond to SLA’s requirements in real world application) and when there are 2 VMs in the virtual cluster, the resources are not fully utilised (according to the constraints expressed via the response time thresholds). The key point to take from this experiment is that when more resources are allocated to deal with increased workload, the utilisation of all other VMs will drop. This drop should be accounted for when running an auto-scaling application so the drop would not trigger resource deallocation.

8.3.1.4 Test 3

In Test Case 3 I wanted to see how the response time sampling interval and the moving average’s window size affects auto-scaling application’s decisions, i.e. if the sampling is done at shorter intervals and smaller window size would the cluster’s size be different? In the first run, the response time is measured every 20 seconds with 4 measurements in moving average’s window (Table 7). The second run the response time was measured every 120 seconds with 5 measurements moving average’s window (Table 9). Workloads for the first run and second run are in Table 8 and Table 10 respectively.

Parameter Value

Monitored URL /DemoWebsite/performance?n=200

Interval of response time measurement 20 seconds

Moving average’s window size 4 measurements

Minimum response time 400 milliseconds

Maximum response time: 1000 milliseconds

Table 7: Application's configuration for Test Case 3 Run 1

Duration Throughput URL

30 minutes 70 /DemoWebsite/performance?n=200

Table 8: Workload for Test Case 3 Run 1

Parameter Value

Monitored URL /DemoWebsite/performance?n=200

Interval of response time measurement 120 seconds

Minimum response time 400 milliseconds

Maximum response time: 1000 milliseconds

Table 9: Application's configuration for Test Case 3 Run 2

Duration Throughput URL

30 minutes 70 /DemoWebsite/performance?n=200

Table 10: Workload for Test Case 3 Run 2

During the first run, the auto-scaling application unexpectedly scaled the cluster to three VMs and stayed at that size until the end of workload, as shown in Figure 12, even though two VMs should have been enough. In the second run of the test, when the application was configured to use a longer response time measurement interval and larger moving average window, the cluster was scaled to two VMs instead (Figure 13). Because the workload was the same (only duration was different), VMs in the second run had higher CPU utilisation rates than VMs in the first run and slightly higher memory usage – CPU utilisation was around 60% during the first run (Figure 14) and about 90% during the second run (Figure 16). Memory consumption of the first two VMs was slightly higher in the second run (Figure 15 and Figure 17).

Figure 13: Number of VMs over time in Test Case 3 Run 2

Figure 15: Memory usage of VMs in Test Case 3 Run 1

Figure 17: Memory usage of VMs in Test Case 3 Run 2

8.3.1.5 Test 4

One of the required parameters for the auto-scaling application to monitor the health of the virtual cluster is the URL where the HTTP requests are sent and response time is measured. Due to the nature of web applications, requests to different URLs cause the application server to perform different work – sometimes it might respond with a cached or static response, whereas other times it might have to perform resource intensive work which cannot be cached.

This test addresses the issue of selecting the monitored URL correctly. I configured the application to monitor the URL which corresponds to big workload (multiplying two 600-by-600 matrices). Unlike in previous tests, there were no web requests sent to the web server.

Parameter Value

Monitored URL /DemoWebsite/performance?n=600

Interval of response time measurement 120 seconds

Moving average’s window size 5 measurements

Minimum response time 400 milliseconds

Table 11: Application's configuration for Test Case 4

Duration Throughput URL

- - -

Table 12: Workload for Test Case 4

The auto-scaling application was stopped after it allocated 4 VMs (Figure 18) and was clear that it would keep allocating VMs until all resources on the cloud were used up. CPU utilisation graph in Figure 19 is particularly interesting in this test – it is possible to see when the HTTP request was sent to the server simply by looking at the spikes in Figure 19. Even more, it is possible to see which VM in the cluster served the request. Memory usage, as shown in Figure 20 shows similar patterns – with every request VM serves, memory usage increases sharply.

In order to avoid problem shown in this test one would have to either change the URL used for monitoring the response time or increase the threshold for maximum response time.

Figure 19: CPU utilisation of VMs in Test Case 4

In document Energy efficient cloud autoscaling. Martynas Puronas BSc Computer Science 2013/2014 (Page 43-53)