• No results found

Results and Analysis

In document Ad hoc Cloud Computing (Page 169-176)

6.4 Reliability

6.4.3 Results and Analysis

Our results from the three experimental runs show that under the environmental condi-tions simulated, the ad hoc cloud can offer a high level of reliability. Table 6.1 shows

the summary of the cloud job success rates, the number of completed jobs and vir-tual machine migrations as well as and the type of failures experienced during each experimental run.

Table 6.1: Reliability and Failure Statistics of the ad hoc Cloud

We see that during the first experimental run, 13 out of a possible 15 cloud jobs are successfully completed despite the unreliability of the simulated ad hoc cloud; this equates to 86.6% of the jobs being completed. The number of VM migrations that were triggered by the simulation was 11, therefore at least 4 cloud jobs did not experience any ad hoc host or guest failures. Similarly, experimental runs 2 and 3 showed that 80% and 93.3% of the respective submitted cloud jobs completed successfully.

Experiment two also showed that one less virtual machine migration occurred due to better selection of an ad hoc host to restore the checkpoint on. The failures that terminated cloud jobs were caused by either virtual machine or BOINC errors. Virtual machine errors were caused by the virtual machine becoming inaccessible to the ad hoc client when tested by the Accessible Detector component, or the virtual machine failed to restore properly. BOINC errors were caused by a failure to upload the cloud job’s results despite the cloud job being completed.

We now show the virtual machine migration traces for each experimental run in Figures 6.4, 6.5 and 6.6 respectively. These traces firstly show the ad hoc hosts se-lected to initially execute each cloud job, labelled ‘Job’ on their first DOWN state event and the series of an ad hoc guest migrations from one ad hoc host to another.

This is depicted by the coloured transition paths between hosts which also indicate the identifier of the job being migrated. For example, Figure 6.4 shows that Job13’s ad hoc guest with the VM ID 184, is migrated from EDIM1 host 159 to EDIM1 host 163 that has a reliability of 99.90791%; reliability values are adjusted after each failure, completed cloud job and state event.

We see that in Figure 6.4, the most virtual machine restoration activity takes place during the first 25 minutes and we also see in some cases, after an ad hoc guest has

6.4. Reliability 157

been migrated, it may have to migrate once again if the underlying ad hoc host be-comes non-operational; a series of virtual machine migrations between ad hoc hosts are depicted by the same colour of transition paths. For example, Job1 of Figure 6.4 is first migrated from EDIM1 host 155 to 167 and then onward to EDIM1 host 152 a short time later. The restoration fails on the next virtual machine migration to EDIM1 host 144 indicated by the error message state event.

A failure during restoration is caused by an unsuccessful restoration procedure by VirtualBox. The virtual machine either simply does not restore or the virtual machine is not accessible after the restore operation; features that we hope are fixed in future versions of VirtualBox. The second failure of the same experimental run is caused by BOINC not uploading the results of Job8 even though the ad hoc server is operational and reachable. Figure 6.4 also shows that ad hoc guests are restored on the most re-liable ad hoc hosts and that ad hoc guests can be restored on ad hoc hosts that have successfully completed their previously assigned job (e.g EDIM1 host 165); we as-sume all cloud jobs run to completion unless explicitly specified with the ‘Complete’

state event.

The virtual machine migration trace of Figure 6.5 is similar to the previous experi-mental run in terms of the ad hoc hosts chosen to restore an ad hoc guest, however the significant difference is that three cloud jobs do not successfully complete. The ad hoc guests executing the cloud jobs Job3 and Job12 failed to restore and the BOINC client executing Job10 did not upload the cloud job’s results; coincidently, Job3 and Job10 fail to successfully complete on the same ad hoc host. The single cloud job Job5 did not return due to a result upload failure in the third experimental run shown in Figure 6.6 by the ’Upload Error’ state event.

For all experimental runs, the average cloud job successful completion rate is 86.6% despite our aim to successfully complete 95% of all cloud jobs. Although in the success rate of 93.3% for experimental run three is close, we fail to meet this crite-rion in our experiments. However, it is important to note that the failures reducing the overall reliability of the ad hoc cloud were not caused by failures within our prototype.

We hope that future releases of both VirtualBox and BOINC will provide solutions to their unrecoverable failures and increase the likelihood of a virtual machine restoring or a result being uploaded.

Therefore it is encouraging that the implementation of our prototype can indeed perform well and that by executing 15 cloud jobs over an unreliable simulated infras-tructure, the ad hoc is still able to successfully complete cloud jobs more than 85% of

the time; a figure that may increase with future improvements made to the ad hoc cloud development and the technologies it uses. Furthermore, we assume that the sporadic behaviour of the Informatics infrastructure at the aforementioned date and time, accu-rately simulates a typical infrastructure an ad hoc cloud will be deployed on, however there will be many cases when operational infrastructures are more unpredictable and unreliable. Only by deploying our ad hoc cloud computing prototype on a number of operational infrastructures with a wider range of workloads will we determine the true reliability of the ad hoc cloud.

6.4. Reliability 159

Key:Job XAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host becomes operationalAd hoc host becomes operationalAd hoc host becomes operationalERRORRestoration error caused by VirtualBoxRestoration error caused by VirtualBoxRestoration error caused by VirtualBoxRestoration error caused by VirtualBoxUPLOAD ERRORBOINC result upload errorBOINC result upload errorJobXAd hoc guest migration between hosts Ad hoc guest migration between hosts Ad hoc guest migration between hosts Ad hoc guest migration between hosts CompleteJob completes before virtual machine restorationJob completes before virtual machine restorationJob completes before virtual machine restorationJob completes before virtual machine restoration VM ID174175172171184199187179185193178192189188195173170181186176191177198197180194182190183196 EDIM1 ID143165155140159147150166141142169156167152146144145154148160157153164163151168161162158149 DateTimeReliability99.9966499.97053599.9976299.5326299.980499.9718299.9835399.7986799.9857399.98855692.5721699.9871199.93947699.7748899.06100599.4143999.9794299.9706184.36481599.40597599.98389499.9793299.9819699.9079196.6899399.9478999.843799.6159982.8328100 13/09/201204:45:23Job 2Job 2Job 4 13/09/201204:46:23Job 15Job 15 13/09/201204:46:33Job 1Job 1 13/09/201204:47:03 13/09/201204:47:53 13/09/201204:48:13 13/09/201204:48:43 13/09/201204:49:13 13/09/201204:49:53 13/09/201204:50:33Job 13Job 13 13/09/201204:55:13 13/09/201204:56:03 13/09/201204:57:53 Job 9Job 9 13/09/201204:59:44Job 10Job 10 13/09/201204:59:53 13/09/201205:00:53Complete 13/09/201205:01:23Job 9Job 9 13/09/201205:01:53Job 5 13/09/201205:02:03Job 3Job 3 13/09/201205:02:43 13/09/201205:02:43 13/09/201205:03:23Complete 13/09/201205:03:53Job 6Job 6 13/09/201205:04:23 13/09/201205:04:43 13/09/201205:04:43 13/09/201205:05:53 13/09/201205:08:03Job 1Job 1 13/09/201205:08:03 13/09/201205:08:13Job 1Job 1 13/09/201205:09:13 13/09/201205:09:13ERROR 13/09/201205:09:13 13/09/201205:09:23 13/09/201205:09:33 13/09/201205:09:53 13/09/201205:09:53 13/09/201205:10:03 13/09/201205:10:03 13/09/201205:10:23 13/09/201205:10:23 13/09/201205:10:23 13/09/201205:10:33 13/09/201205:10:53 13/09/201205:11:23 13/09/201205:11:33 13/09/201205:11:53 13/09/201205:12:43 13/09/201205:13:03 13/09/201205:13:23 13/09/201205:13:33 13/09/201205:13:33 13/09/201205:13:53Job 12 13/09/201205:14:53 13/09/201205:16:53Job 14 13/09/201205:17:13 13/09/201205:17:33 13/09/201205:17:53 13/09/201205:18:13 13/09/201205:18:23Job 8 13/09/201205:18:23Job 11 13/09/201205:18:23 13/09/201205:19:33UPLOAD ERROR 13/09/201205:19:53 13/09/201205:20:03Job 7 13/09/201205:20:03 13/09/201205:20:23 13/09/201205:20:43 13/09/201205:21:33 13/09/201205:21:43 13/09/201205:21:43 13/09/201205:22:03Job 13 13/09/201205:22:53 13/09/201205:22:53 13/09/201205:23:23 13/09/201205:24:33 13/09/201205:24:53 13/09/201205:25:13 13/09/201205:26:03 13/09/201205:26:13 13/09/201205:27:23Job 6 13/09/201205:27:23 13/09/201205:27:43 13/09/201205:27:53 13/09/201205:28:03 13/09/201205:28:13Job 2 13/09/201205:28:13 13/09/201205:28:13 13/09/201205:29:03Job 15 13/09/201205:29:03 13/09/201205:30:13 13/09/201205:30:13 13/09/201205:30:43 13/09/201205:30:43 13/09/201205:31:03 13/09/201205:31:53 13/09/201205:31:53 13/09/201205:31:53 13/09/201205:32:13 13/09/201205:32:13 13/09/201205:32:23 13/09/201205:33:03 13/09/201205:34:23 13/09/201205:34:53 13/09/201205:36:43 13/09/201205:38:13 13/09/201205:38:43 13/09/201205:39:03

Figure 6.4: Simulated Host Failures and Job Relocations for Experimental Run 1

Key:Job XAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host becomes operationalAd hoc host becomes operationalAd hoc host becomes operationalERRORRestoration error caused by VirtualBoxRestoration error caused by VirtualBoxRestoration error caused by VirtualBoxRestoration error caused by VirtualBoxUPLOAD ERRORBOINC result upload errorBOINC result upload errorBOINC result upload errorJobXAd hoc guest migration between hosts Ad hoc guest migration between hosts Ad hoc guest migration between hosts Ad hoc guest migration between hosts CompleteJob completes before virtual machine restorationJob completes before virtual machine restorationJob completes before virtual machine restorationJob completes before virtual machine restoration VM ID174175172171184199187179185193178192189188195173170181186176191177198197180194182190183196 EDIM1 ID169156146160162153142148154141157159149144167152143140161168163145151150158155164166147165 DateTimeReliability99.9966499.97053599.9976299.5326299.980499.9718299.9835399.7986799.9857399.98855692.5721699.9871199.93947699.7748899.06100599.4143999.9794299.9706184.36481599.40597599.98389499.9793299.9819699.9079196.6899399.9478999.843799.6159982.8328100 13/09/201204:45:23Job 3Job 3Job 4 13/09/201204:46:23Job 14Job 14 13/09/201204:46:33Job 2Job 2ERROR 13/09/201204:47:03 13/09/201204:47:53 13/09/201204:48:13 13/09/201204:48:43 13/09/201204:49:13 13/09/201204:49:53 13/09/201204:50:33Job 10Job 10 13/09/201204:55:13 13/09/201204:56:03 13/09/201204:57:53 Job 12Job 12 13/09/201204:59:44Job 8Job 8 13/09/201204:59:53ERROR 13/09/201205:00:53 13/09/201205:01:23 13/09/201205:01:53Job 7Job 7 13/09/201205:02:03Job 5Job 5 13/09/201205:02:43 13/09/201205:02:43 13/09/201205:03:23 13/09/201205:03:53Job 6Job 6 13/09/201205:04:23 13/09/201205:04:43 13/09/201205:04:43 13/09/201205:05:53Complete 13/09/201205:08:03Job 2Job 2 13/09/201205:08:03 13/09/201205:08:13Job 5 13/09/201205:09:13 13/09/201205:09:13 13/09/201205:09:13 13/09/201205:09:23 13/09/201205:09:33 13/09/201205:09:53 13/09/201205:09:53 13/09/201205:10:03 13/09/201205:10:03 13/09/201205:10:23 13/09/201205:10:23 13/09/201205:10:23 13/09/201205:10:33 13/09/201205:10:53 13/09/201205:11:23 13/09/201205:11:33 13/09/201205:11:53 13/09/201205:12:43 13/09/201205:13:03 13/09/201205:13:23 13/09/201205:13:33 13/09/201205:13:33 13/09/201205:13:53Job 1 13/09/201205:14:53 13/09/201205:16:53Job 11 13/09/201205:17:13 13/09/201205:17:33 13/09/201205:17:53 13/09/201205:18:13 13/09/201205:18:23Job 9 13/09/201205:18:23Job 13 13/09/201205:18:23 13/09/201205:19:33 13/09/201205:19:53 13/09/201205:20:03Job 15 13/09/201205:20:03 13/09/201205:20:23 13/09/201205:20:43 13/09/201205:21:33 13/09/201205:21:43 13/09/201205:21:43 13/09/201205:22:03Job 14 13/09/201205:22:53 13/09/201205:22:53 13/09/201205:23:23 13/09/201205:24:33 13/09/201205:24:53 13/09/201205:25:13 13/09/201205:26:03 13/09/201205:26:13 13/09/201205:27:23Job 8 13/09/201205:27:23 13/09/201205:27:43 13/09/201205:27:53 13/09/201205:28:03UPLOAD ERRORUPLOAD ERRORUPLOAD ERROR 13/09/201205:28:13Job 10 13/09/201205:28:13Job 7 13/09/201205:28:13 13/09/201205:29:03Job 6 13/09/201205:29:03 13/09/201205:30:13 13/09/201205:30:13 13/09/201205:30:43 13/09/201205:30:43 13/09/201205:31:03 13/09/201205:31:53 13/09/201205:31:53 13/09/201205:31:53 13/09/201205:32:13 13/09/201205:32:13 13/09/201205:32:23 13/09/201205:33:03 13/09/201205:34:23 13/09/201205:34:53 13/09/201205:36:43 13/09/201205:38:13 13/09/201205:38:43 13/09/201205:39:03

Figure 6.5: Simulated Host Failures and Job Relocations for Experimental Run 2

6.4. Reliability 161

Key:Job XAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host running job X becomes non-operationalAd hoc host becomes operationalAd hoc host becomes operationalAd hoc host becomes operationalERRORRestoration error caused by VirtualBoxRestoration error caused by VirtualBoxRestoration error caused by VirtualBoxRestoration error caused by VirtualBoxUPLOAD ERRORBOINC result upload errorBOINC result upload errorBOINC result upload errorJobXAd hoc guest migration between hosts Ad hoc guest migration between hosts Ad hoc guest migration between hosts Ad hoc guest migration between hosts CompleteJob completes before virtual machine restorationJob completes before virtual machine restorationJob completes before virtual machine restorationJob completes before virtual machine restoration VM ID174175172171184199187179185193178192189188195173170181186176191177198197180194182190183196 EDIM1 ID169161144160149145142155141150154167151166162153140156158168163146159157147165164143148152 DateTimeReliability99.9966499.97053599.9976299.5326299.980499.9718299.9835399.7986799.9857399.98855692.5721699.9871199.93947699.7748899.06100599.4143999.9794299.9706184.36481599.40597599.98389499.9793299.9819699.9079196.6899399.9478999.843799.6159982.8328100 13/09/201204:45:23Job 2Job 2Job 3 13/09/201204:46:23Job 12Job 12 13/09/201204:46:33Job 1Job 1 13/09/201204:47:03 13/09/201204:47:53 13/09/201204:48:13 13/09/201204:48:43 13/09/201204:49:13 13/09/201204:49:53 13/09/201204:50:33Job 11Job 11 13/09/201204:55:13 13/09/201204:56:03 13/09/201204:57:53 Job 7Job 7 13/09/201204:59:44Job 6Job 6 13/09/201204:59:53 13/09/201205:00:53 13/09/201205:01:23Job 11 13/09/201205:01:53Job 8Job 8 13/09/201205:02:03Job 5Job 5 13/09/201205:02:43 13/09/201205:02:43 13/09/201205:03:23 13/09/201205:03:53Job 4Job 4 13/09/201205:04:23 13/09/201205:04:43 13/09/201205:04:43 13/09/201205:05:53 13/09/201205:08:03Job 2Job 2 13/09/201205:08:03 13/09/201205:08:13Job 2 13/09/201205:09:13 13/09/201205:09:13 13/09/201205:09:13 13/09/201205:09:23 13/09/201205:09:33 13/09/201205:09:53 13/09/201205:09:53 13/09/201205:10:03 13/09/201205:10:03 13/09/201205:10:23 13/09/201205:10:23 13/09/201205:10:23 13/09/201205:10:33 13/09/201205:10:53 13/09/201205:11:23Job 12 13/09/201205:11:33 13/09/201205:11:53 13/09/201205:12:43 13/09/201205:13:03Job 5Job 5 13/09/201205:13:23 13/09/201205:13:33ERROR 13/09/201205:13:33 13/09/201205:13:53Job 9 13/09/201205:14:53 13/09/201205:16:53Job 10 13/09/201205:17:13 13/09/201205:17:33Job 5 13/09/201205:17:53 13/09/201205:18:13 13/09/201205:18:23Job 14 13/09/201205:18:23Job 15 13/09/201205:18:23 13/09/201205:19:33 13/09/201205:19:53 13/09/201205:20:03Job 13 13/09/201205:20:03 13/09/201205:20:23 13/09/201205:20:43 13/09/201205:21:33 13/09/201205:21:43 13/09/201205:21:43 13/09/201205:22:03Job 1 13/09/201205:22:53 13/09/201205:22:53 13/09/201205:23:23 13/09/201205:24:33 13/09/201205:24:53 13/09/201205:25:13 13/09/201205:26:03 13/09/201205:26:13 13/09/201205:27:23Job 4 13/09/201205:27:23 13/09/201205:27:43 13/09/201205:27:53 13/09/201205:28:03 13/09/201205:28:13 13/09/201205:28:13Job 8 13/09/201205:28:13 13/09/201205:29:03 13/09/201205:29:03 13/09/201205:30:13 13/09/201205:30:13 13/09/201205:30:43 13/09/201205:30:43 13/09/201205:31:03 13/09/201205:31:53 13/09/201205:31:53 13/09/201205:31:53 13/09/201205:32:13 13/09/201205:32:13 13/09/201205:32:23 13/09/201205:33:03 13/09/201205:34:23 13/09/201205:34:53 13/09/201205:36:43 13/09/201205:38:13 13/09/201205:38:43 13/09/201205:39:03

Figure 6.6: Simulated Host Failures and Job Relocations for Experimental Run 3

In document Ad hoc Cloud Computing (Page 169-176)