Realising Contention-awareness in InterGrid
7.3 System Design and Implementation
7.3.4 Local Scheduler
The critical part of an LRMS is a scheduler that has to allocate resources across an RP efficiently. The local scheduler should be aware of the contention between local and external requests within an RP.
As mentioned earlier, OpenNebula, as the virtual infrastructure manager in the RPs, offers an immediate provisioning model, where virtualised resources are allocated at the time they are requested. However, resource provisioning in InterGrid implies requirements that cannot be supported within this model, such as resource requests that are subject to priorities, capacity reservations at specific times, and variable resource usage throughout a VM’s lifetime. Additionally, in smaller RPs not all requests can be allocated immediately due to resource shortage.
Haizea is an open source scheduler developed by Sotomayor et al. [17] that employs VM-based leases for resource provisioning. The advantage of Haizea is that is considers overheads of deploying VMs (e.g., suspension and resumption) in the scheduling. It enables resource providers to provide advance-reservation leases (to guarantee resource availability) along with best-effort leases (to increase resource utilisation) where advance-reservation leases have preemptive priority over best-effort leases.
We adopt Haizea as the local scheduler of the LRMS in RPs. As a result, the
scheduling capability of the virtual infrastructure manager (i.e. OpenNebula) is extended and enables the LRMS to recognise the contention between local and external requests that occurs in the RP. More importantly, adopting Haizea as the local scheduler enables lease of resources to external requests in a best-effort manner while respecting allocation of resources to local requests in their requested time interval. When a contention occurs, the scheduler resolves it through pre-emption of external lease(s) and vacation of resources to serve the local request.
In this way, the local scheduler operates as the scheduling back-end of OpenNeb-ula. It also employs backfilling scheduling strategy along with VMs’management abilities (i.e., suspend, resume, and migrate) to efficiently schedule the leases and increase the resource utilisation.
Although the local scheduler described enables recognition of the contention between local and external requests and resolves it using preemption, the policy used does not consider the side-effects caused by preemption. Therefore, in the next step, we implemented different preemption policies (as discussed in Chap-ter 3) in the local scheduler that proactively detects the resource contentions and try to reduce their impact. These policies decrease the number of resource con-tentions take place and increase the resource utilisation in an RP. We have imple-mented the following contention-aware preemption policies for the local scheduler:
• MLIP (Na¨ıve): This policy tries to minimise the contention by reducing the number of requests affected by the preemption. Thus, this policy preempts large leases regardless of the overhead imposed for their preemption.
• MOV: The second preemption policy that we have implemented sought to minimise the overall overhead time imposed to the system by preempting VMs. Implementation of this policy is based on the selection of a set of leases for preemption that result in the minimum overhead time. For such purpose, we calculate the overhead imposed by preemption of each lease, then preempt leases with minimum overhead.
• MOML: This policy takes into account both the number of contentions as well as the overhead time imposed to the system by preemption of different leases. Implementation of this policy involves two rounds. In the first round, the overhead imposed by preempting each external lease is calculated. In the second round, leases are sorted based on the imposed overhead, then, the minimum number of leases are selected by considering the overhead of preempting each lease.
The sequence diagram of invocations between local scheduler classes is shown in Figure 7.5. The scheduling process in the local scheduler starts by receiving a
LocalManager LeaseSched VMsched Mapper PreemptionPolicy ResourcePool assign(Req)
reqLease(req)
schedule(req)
map(req)
sortLeases()
preemptOrder mapping
VMRsrv(req)
reservation
startVMs(req)
Figure 7.5: Schedule of local requests in the local scheduler.
lease request either from local or external user (through IGG) in the LocalManager class.
The manager requests the LeaseScheduler class to schedule the lease request.
Then, the schedule method in the VMScheduler class is called which schedules local and external requests. For local requests VMs are scheduled based on the requested time interval. External requests are allocated in the first vacant space.
The map function in the mapper class maps requested resources to the physi-cal resources based on their availability times. When the mapper class handles a local requests, if there is not enough resources, then the mapper calls the Preemp-tionPolicy to determine the preferred order of preempting external leases. The order is determined based on the preemption policy discussed above. Then, the mapper can perform the mapping and returns the mapping list to the VMSched-uler. Using the mapping information, the VMScheduler calls the VMRsrv and updates the scheduling information of the resources. After that, the lease can be started by calling the startVMs method in the ResourcePool class. Additionally, the LeaseScheduler is informed to update all the affected leases in the scheduling table.
7.4 Performance Evaluation
The testbed for performance evaluation of the implemented system is as follows:
• A four-node cluster as the RP. Worker nodes are 3 IBM System X3200 M3 machines, each with a quad-core Intel Xeon x3400, 2.7 GHz processor and 4 GB memory. The head node, where the LRMS resides, is a Dell Optiplex
755 machine with Intel Core 2 Duo E4500, 2.2 GHz processor and 2 GB of memory.
• The host operating system installed in the server nodes is the CentOS 6.2 Linux distribution. Also, the operating system in the head node is Ubuntu 12.4.
• All the nodes are connected through a 100 Mbps switched Ethernet network.
• We used OpenNebula 3.4 and Haizea version 1.1 as the virtual infrastructure manager and the local scheduler, respectively.
• Qemu-KVM 0.12.1.2 is used as the hypervisor on each server.
• GlusterFS is used as the cluster file system. It aggregates commodity stor-ages across a cluster and forms a large parallel network file system [166].
The disk images needed by the VMs and the memory image files (created when a VM is suspended) are stored on the shared file system.
The scenario we consider in our experiment involves an InterGrid with 3 IGGs with peering arrangements established between them, as illustrated in Figure 7.6.
IGG1 has the cluster as the RP and users from IGG2 and IGG3 request leases through their DVE manager. Based on the peering arrangements, IGG1 provides them resources. IGG1 receives these requests in form of external requests and they are allocated resources through the LRMS of the RP. However, the RP has its own local requests that have more priority than the external ones. Information of the lease requests received by the LRMS are explained in the Table 7.1.
LRMS
IGG2
IGG1
IGG3
RP
Local User External
User
Figure 7.6: Evaluation scenario based on 3 InterGrid Gateways.
To be able to follow the order of events occurring in the system and demon-strate their impact, we perform the evaluation on 7 lease requests that are
submit-ted to the RP. Each row of the table shows the arrival time, number of requessubmit-ted processing elements, amount of memory, duration, and request type (i.e., local or external). We consider 00:00:00 as the start of the experiment (i.e., the arrival of the first request) and the arrival time of other requests are proportional to the start time of the experiment. All of these lease requests use a ttylinux disk image located on the shared storage.
Table 7.1: Characteristics of lease requests used in the experiments.
Request ID Arrival Time No. Nodes Memory (MB) Duration (s) Type
1 00:00:00 3 256 3600 External
2 00:05:00 1 128 5400 External
3 00:06:00 2 128 5400 External
4 00:08:00 1 256 5400 External
5 00:08:50 2 64 2400 External
6 00:09:40 3 128 3600 External
7 00:12:00 5 128 3600 Local