Host Filtering - Job Scheduling - Ad hoc Cloud Computing

4.5 Job Scheduling

4.5.2 Host Filtering

We model our job scheduling mechanism on the virtual machine scheduler of Open-Stack. OpenStack is an open source and scalable operating platform for building public and private clouds [31]. Its virtual machine scheduler, called nova-scheduler, calcu-lates the near-optimal host to deploy a virtual machine on. This decision is based on host availability, specification and resource load. The nova-scheduler has two phases:

filtering and weighing. Filtering determines if a host is eligible for a virtual machine to be dispatched to it.

Commonly applied filters are CoreFilter, RamFilter, DiskFilter that determine if a host has enough processors, memory and storage space respectively. This eligibil-ity list is then passed to the weighing phase where hosts are ordered according to administrator-defined weights to determine the best hosts for a virtual machine to be deployed upon. We discuss how cloud jobs are scheduled according to ad hoc host availability, specification, resource load and reliability based on a modified version of the OpenStack nova-scheduler.

4.5. Job Scheduling 101

4.5.2.1 Availability

In order to select an available ad hoc host, the ad hoc server maintains a list of available hosts determined via the availability checker daemon added to the VM Service project of the ad hoc server shown in Figure 4.4. To ascertain whether an ad hoc host is available, the availability checker periodically queries the VM Service database to determine when an ad hoc client last polled the server. If the ad hoc client polled the server within the last two minutes, the ad hoc host is deemed available for use.

Currently, regular BOINC clients only contact the server to obtain a job, return results or when a volunteer host explicitly instructs the BOINC client to contact the server. Therefore in most cases, a BOINC client will not poll the BOINC server for long periods of time despite being still available to execute applications. To solve this, we have added a Periodic Updater component to the ad hoc client, as shown in Figure 4.5, that polls the ad hoc server every minute; this to similar in the case of OpenStack where compute nodes (i.e. those that run virtual machines) periodically signal to the compute service that they are still available. The Periodic Updater is implemented as a pthread which is created when the ad hoc client is instantiated; POSIX threads, or pthreads, is a standard for threads in the Portable Operating System Interface (POSIX) family of standards [196].

Upon each poll from an ad hoc client, the ad hoc server stores the contact time in the VM Service project database. This allows the availability checker to determine whether the ad hoc client has indeed polled in the last two minutes. Those who have not polled within this time period are set to unavailable. The ad hoc Scheduler queries the VM Service database to obtain a list of all available ad hoc hosts.

4.5.2.2 Host Hardware Specifications

Available ad hoc hosts are then analyzed to determine if they physically have enough resources available to execute both an ad hoc guest and cloud job. Although we do not know the amount of resources a cloud job, and consequently an ad hoc guest will use before execution, we assume that both require a reasonable amount of resources to execute effectively. We therefore assume that each ad hoc host has at least 1 CPU core, 1 GB of RAM and 20 GB of storage space.

It is possible to monitor and store the execution times and resource usage levels of previously executed cloud jobs or benchmarks and predict a newly submitted cloud job’s execution time and resource usage levels based on the similarity. While there are

many studies that outline the process and value of employing this approach [95, 96, 145, 128, 67, 45], the difficultly of determining whether a cloud job, before it has even been executed, shares characteristics with those previously run is an extremely difficult task and is worthy of being investigated in a new course of research.

As previously mentioned in Section 2.4.2 of Chapter 2, a BOINC client automati-cally records the amount of resources the volunteer host has when it is first run. How-ever volunteer user-based preferences limit both the BOINC client’s and volunteer application’s use of these resources. Based on both these data sets, the ad hoc Sched-uler analyses the amount of resources an ad hoc guest and cloud job could potentially access. Ad hoc hosts that do not satisfy the resource criteria above are removed from the list of potential cloud job execution candidates. This is similar to the operations performed by the OpenStack nova-scheduler that calculates suitable hosts for virtual machine placement based on the filters CoreFilter, RamFilter and DiskFilter.

4.5.2.3 Resource Load

The resource load of the remaining ad hoc hosts is then retrieved. This is made possible by incorporating Ganglia (see Section 2.5.2 of Chapter 2) into the ad hoc client, which is depicted as the Resource Monitor in Figure 4.5. Upon installing the ad hoc client, an ad hoc host user or owner therefore does not need to install Ganglia separately; we discuss the installation of the ad hoc cloud components in Section 4.8 of this chapter.

The Ganglia gmond daemon runs locally on the ad hoc host and collects CPU and memory load as well as disk consumption and network usage. While network usage may be useful to determine which cloud jobs are best suited to a particular ad hoc host, we omit network usage from our scheduling calculations and leave this for future work.

The Ganglia gmetad daemon runs upon the ad hoc server and collects the monitoring data from the ad hoc hosts. As previously mentioned in Chapter 2, data collected by Ganglia are stored in rrd files. To enable the ad hoc Scheduler to read the stored values, the rrd files for each ad hoc host are queried to obtain the latest resource loads.

Resource loads can be obtained by using the following command:

rrdtool fetch cpu_system.rrd AVERAGE -r 120 -s -120

This rrdtool fetch command fetches the average CPU loads calculated for each 15 second period over a total of two minutes. By default, Ganglia averages monitoring data over each 15 second period, however we average the load over each two minute

4.5. Job Scheduling 103

period to smooth the fluctuations of real-time monitoring data and get a good indication of the current load.

If an OpenStack scheduler was integrated into the ad hoc Scheduler, at this point the nova-scheduler would begin the weighing process and then reserve an ad hoc host that is available, has enough hardware to exploit and has the least memory usage; the latter can be modified to filter and weigh according other metrics. However, we assume that for the ad hoc cloud to offer reasonable performance to cloud jobs, ad hoc host processes should not utilize more than 70% of the CPU and have at least 512 MB of memory available when the cloud job is executing. The output from the above com-mand is passed to the ad hoc Scheduler which decides if the current load is acceptable for ad hoc guest and cloud job execution. Ad hoc hosts that have an average greater than the values specified are removed from the list of potential execution candidates.

These average resource usage values are stored alongside the potential ad hoc hosts database entries that could be used to execute currently awaiting cloud jobs.

In summary, an ad hoc host must have the hardware specifications previously men-tioned and have enough of these resources available to offer reasonable performance.

For example, although an ad hoc host with a total of 768 MB of RAM (i.e. less than our 1GB requirement) could be frequently underutilized, therefore meeting our mini-mum available amount of memory set at 512 MB, the lack of potential access to more resources does not give the cloud job the opportunity to perform better when it requires more resources. Therefore this is why the ad hoc Scheduler filters ad hoc hosts based on both hardware specifications and resource load.

In document Ad hoc Cloud Computing (Page 114-117)