Resource usage representation: Hardware vs AMF SI

2. Background on Availability, Cloud, SA Forum Middleware and Related Work

2.8. Related work

2.8.2. Resource usage representation: Hardware vs AMF SI

In a cluster setup, the workload measurements are collected from each of the cluster’s node or VMs by a Monitoring Engine. As shown in Fig. 2-11, workload measurements are collected from each node of the cluster by the Monitoring Engine. In the context of this description, VMs and nodes refer to the same entity. The Monitoring Engine then aggregates the workload data and outputs a summary of the cluster’s overall workload, which may be expressed in term of the cluster’s VMs’ resource usage metrics, e.g. CPU usage, memory usage, network bandwidth usage, etc. The limitation of above approach is that the workload of the VMs are associated to the services they are providing permanently. This approach does not consider the possibility that the services can be removed/re-assigned from the VMs over time. By assuming that the VMs maintain the same service assignments at all-time result in incorrect monitoring output.

VM

Monitoring Engine

…

{CPU:10%, RAM: 1.1GB, …} {CPU:44%, RAM: 4.1GB, …} {CPU:22%:, RAM: 0.7GB, …} {CPU:36%:, RAM: 1GB, …}

Cluster-1-Workload: {CPU: ...}

Figure 2-11: Monitoring in terms of hardware entities

In the case where each VM has multiple components, each of which provides one or more service(s) and the services can be assigned/removed from the components over time, a fine-grained monitoring and data collection is essential to estimate the workload of services.

For example, as illustrated in Fig. 2-12 where VMs from VM1 to VM3 host components that

provide Service-1; VMs from VM2 to VMN host components that provide Service-2. The services

can be assigned/re-assigned to the components dynamically. The existence of a service provider entity capable of providing service (I.e. a component) does not necessarily imply that the resource usage of that service provider must be associated with the service that it is providing intermittently. There needs to be a valid assignment of a service to the service provider entity to correctly associate the service provider's load with the service. In other words, a component can exist and run on a VM at all-time but whether the workload of that component should be associated to a service depends on the assignment of a service to the component. Without the assignment of a service to a service provider, the workload of the service provider is irrelevant with respect to

services. For example, in Fig. 2-12, comp-1 of VM-2 does not have any valid service assignment, hence in the solution, its workload is not taken into account while measuring the workload of Service-1. The resource usage of a service needs to be continuously updated as services are assigned/re-assigned to components at runtime.

Different component types are tied to the types of service they can provide. I.e. one type of component can provide a set of defined of services and is not capable of providing a service beyond its capability. A VM may host many components providing many different services. It is possible for multiple services to be provided from the same VM as a VM can host many types of components and those components can have many types of services assigned to them. In a system where VMs are dynamically assigned to applications or services and it is possible for different applications and services to collocate, monitoring VM level measurements would provide incorrect output.

Note that the service collocation problem is not completely solved by the approach introduced in this thesis. While it is possible to detect service assignment-reassignment at the process level over time following the approach introduced in this thesis, it is not possible to differentiate between two different services provided by the same process at the same time. Similarly, it is not possible to measure load of two different services provided by the same component simultaneously using the approach introduced in this thesis. For the approach to be effective, it is important that a process and components run by that process do not have one-to- many relationship, and a component and its services do not have one-to-many relationship.

35 VM1 VM2 VM3 _VM_N Monitoring Engine

…

{comp1:{…}, …} {comp2:{…}, …} {comp1:{…}, comp2:{…}, …} {comp2:{…}, …} Service-1-Workload: {CPU:…,} Service-2-Workload: {CPU:…,}

Service -1

Service -2

comp1 comp1 comp1

comp2 comp2

comp2

Component with CSI assignment Component with no CSI assignment

Figure 2-12: Service level system resource usage representation

In the setup illustrated in Fig. 2-12, VM-2 and VM-3 host components capable of providing both Service-1 and Service-2. Therefore, in VM level workload data, Service-1 and Service-2 are collocated in terms of VMs. A monitoring solution that only measures VM load will associate some load of Service-2 with Service-1 and vice versa.

Comparing the clusters and their corresponding monitoring solutions illustrated in Fig. 2-11 and Fig. 2-12, we show that the existing monitoring solutions are not capable of adapting to the dynamic nature of the services in a cluster managed by an SA Forum middleware. We conclude that a new monitoring solution needs to be introduced where the solution will take into account the states and the dynamic nature of the services managed by the SA Forum middleware.

In document Monitoring Service Level Workload of Highly Available Applications (Page 46-50)