The model for the Node.js application differs from the NGINX model in the following ways: 1) a square wave workload instead of a constant workload, 2) the remainder of the requests each second is divided among the queues instead of being discarded, and 3) there is no limiting divisor or related variables.
To represent a square wave workload, we add an automaton to control the workload shape. This is shown in Figure 5.14. Let us also introduce parameters T_High set to Thigh, RPS_Max_High set to RP Shigh, T_Low set to
Tlow and RPS_Max_Low set to RP Slow. RPS_Max is initially set to RPS_Max_ High. The workload automaton starts in high_load. On the one_second event it increments the value of W_Seconds_Elapsed by one. It stays in this state until W_Seconds_Elapsed is equal to T_High. When this is true, it transitions to state low_load. During the transition, the variable RPS_Max is set to RPS_
low_load high_load RPS_Max = RPS_Max_High W_Seconds_Elapsed += 1 RPS_Max = RPS_Max_Low W_Seconds_Elapsed < T_High W_Seconds_Elapsed = 0
W_Seconds_Elapsed >= T_Low W_Seconds_Elapsed >= T_High
W_Seconds_Elapsed += 1 W_Seconds_Elapsed = 0 W_Seconds_Elapsed < T_Low one_second one_second one_second one_second
Figure 5.14: The workload_shape automaton.
Max_Low and W_Seconds_Elapsed is reset to zero. The automaton then stays in state low_load until W_Seconds_Elapsed equals T_Low. Then it transitions back to state high_load, setting RPS_Max back to RPS_Max_High and resetting again W_Seconds_Elapsed to zero. This cycle can repeat indefinitely.
To divide the remainder of requests, a modification is made to the user
automaton, and a new automaton is introduced. The modified user automaton
is shown in Figure 5.15. The key difference with respect to NGINX is the transition allocate_req_to_pods_remainder, which allocates the remainder of requests among the active pods. The new automaton is pod_queue_remainder (Figure 5.16), and this is part of the for-each loop representing the pods.
The limiting divisor is not required in the Node.js, since we found that the compilation and verification times for this model were already acceptable
main_requests_allocated submitting requests_all_allocated idle waiting Submitted’ <= RPS_Max
Remainder = 0 Remainder = Submitted % Pods_On Next_Pod = Next_Pod_Temp + 1
Next_Pod_Temp = (Next_Pod + Remainder - 1) % Pods_On
allocate_req_to_pods_remainder one_second
user_finished
user_submits_requests
allocate_req_to_pods_main
Figure 5.15: The user automaton for the Node.js application WATERS model.
working
Pod_Index < Next_Pod & Pod_Index < Next_Pod + Remainder - Pods_On
QL[Pod_Index] = QL[Pod_Index]
QL[Pod_Index] += 1
Pod_Index >= Next_Pod & Pod_Index >= Next_Pod + Remainder
QL[Pod_Index] = QL[Pod_Index]
Pod_Index >= Next_Pod & Pod_Index < Next_Pod + Remainder
QL[Pod_Index] += 1
Pod_Index < Next_Pod & Pod_Index >= Next_Pod + Remainder - Pods_On
allocate_req_to_pods_remainder allocate_req_to_pods_remainder
allocate_req_to_pods_remainder
allocate_req_to_pods_remainder
Figure 5.16: The pod_queue_remainder automaton that allocates the remain- der of requests in the Node.js model.
without it. The maximum queue length is shorter than for NGINX, due to the longer response times of the Node.js application. This means there are fewer possible values of the queue length variables, and thus a naturally smaller state space.
The next chapter presents the experimental setup used to test whether this model is accurate to a real cloud system.
Experimental Methodology
To test the accuracy of our model, we created a local Kubernetes cluster, drove a workload using JMeter, analysed the results to see whether the SLO was met, and compared this to the results from the WATERS model. This was done for a set of test cases, in which parameters were varied such as the number of requests sent per second and the maximum number of pods the cloud can scale to. Each test case was run for a number of trials. An overview of the experimental methodology is shown in Figure 6.1.
In summary, the following high-level process was used:
1. Model the cloud system in WATERS.
2. Create the real system using Kubernetes.
3. Test the Kubernetes cloud using JMeter using a comprehensive set of test cases.
4. Analyse the results from JMeter using Python scripts to see whether the SLA was met or not.
5. Run verifications in WATERS to see whether it predicts the system will meet the SLA or not.
6. Compare the results from JMeter and WATERS to see how closely they match.
It was decided not to test on a public cloud, since it was likely that outside interference would cause the results to be inconsistent.
6.1
Applications
Two different applications were tested: 1) a default NGINX installation and 2) a Node.js application with a fixed response time. NGINX1 is a load balancer and reverse proxy that provides a static “Hello World” web page to every request by default. This represents a simple, standalone application with a variable but fast response time. NGINX performs round-robin load balancing by default.
We also tested a simple Node.js application with fixed response time via a semaphore loop. The application sends a unique response to each request, in order to avoid sending “304” (not modified) responses. The default response time is 1 second, but can be set per request using the “rt” parameter (for example, “rt=500” for a response time of 500 milliseconds). In contrast to NGINX, this is a relatively slow, stable response time.
Node.js is single-threaded, allows close control over the sequence in which requests are responded to. This makes it a suitable candidate for a test using a fixed response time. It should be noted, however, that I/O operations are not single-threaded [44]. However, since there are few I/O operations in this simple application, this should not be a concern.