In this chapter, we have discussed the six founding principles of ad hoc cloud comput-ing: virtualization, cloud computing, volunteer computing, monitoring, management and testing. Firstly we outlined the basics of virtualization and detailed a select number of virtualization technologies that have the potential to be used within an ad hoc cloud
2.7. Summary 53
computing infrastructure. An analysis of current research showed that most virtualiza-tion technologies perform well, however this is dependent on the underlying hardware and the application executing on the virtual machine. We then gave an overview of cloud computing and the benefits and drawbacks other studies have exposed. This led to an analysis of Amazon EC2, its architecture and the typical costs and performance one may expect.
The background then focussed on volunteer and Grid computing as well as the important research that has led to the success of these computational models. By out-lining the difference between the two models, we were able to distinguish which model is suited to ad hoc clouds. We then gave a detailed overview of the BOINC volunteer system and analyzed previous research to determine its benefits and drawbacks; in par-ticular its performance-related aspects. BOINC however was shown to perform well overall.
Our overview then discussed the current state of a subset of infrastructure monitor-ing and managements tools. In particular, we focussed on Ganglia, its architecture and performance. Other studies found that Ganglia has a high overhead in relation to the amount of data it transfers over the network. We also gave a brief description of Na-gios and the logs it produced. We then outlined three infrastructure management tools called Webmin, Capistrano and cexec and showed how they are able to offer concurrent command execution over a set of hosts.
In order to evaluate many of the ad hoc cloud computing founding principles above as well as the ad hoc cloud as a fully functioning platform, we selected a number of applications, namely the stress workload generator, Primes, CreateGB and SPRINT.
We also use these applications to test the reliability and performance of the ad hoc cloud as a fully integrated system.
Chapter 3
V-BOINC: The Virtualization of BOINC
3.1 Introduction
In this chapter, we discuss how two of the six founding principles of ad hoc cloud com-puting are integrated to provide the basis of our ad hoc platform; these are volunteer computing and virtualization. Volunteer computing systems, and in particular BOINC, deal with many of the complexities surrounding non-dedicated hosts. BOINC also provides an infrastructure where computational jobs can be created, sent to volunteers, executed and returned for analysis.
By integrating virtualization into BOINC, we not only create an initial platform that can be extended to solve our research challenges outlined in Section 1.2.1 of Chapter 1, but we can also solve many of BOINC’s drawbacks. The drawbacks of BOINC relate to running applications in the user space of the volunteer machine; the portion of system memory where user processes execute. These drawbacks are:
• Project developers are required to port their application to every target machine architecture.
• Project developers need to provide application-level checkpointing to ensure job progress is not lost upon host termination or failures.
• Project developers are limited to creating applications that have no dependencies.
• Users of BOINC must trust that project servers they attach to, will not distribute malicious or untrustworthy applications.
By virtualizing BOINC, an application developer only needs to port an application to a single virtual machine architecture, host security is addressed by sandboxing
there-55
fore protecting the host from third party applications and system-level checkpointing is available. Applications with dependencies can also easily execute where dependen-cies may be pre-installed or attached to a virtual machine. This enables application developers to create more complex applications to obtain results of more value. These challenges are solved by our implementation of virtual BOINC, or V-BOINC.
The foundation of our approach relies on sending lightweight virtual machine im-ages to volunteer clients allowing BOINC applications to run in the virtual machine itself rather than in the user space of the host. This is implemented by installing a BOINC client within the virtual machine image to fetch applications for a user spec-ified project. This is in addition to the BOINC client installed on the user’s host to download the virtual machine image.
Our approach to virtualization within BOINC allows V-BOINC to execute appli-cations from typical BOINC projects such as SETI@Home and future projects with applications that have dependencies. This will in turn increase the number of potential applications volunteer infrastructures are able to execute. The use of V-BOINC there-fore aims to enable access to computations that could not otherwise be performed, enabling more science, design and business to be done.
In this chapter, we first give an overview of related research describing other stud-ies that have attempted to incorporate virtualization into volunteer infrastructures. This is followed by our own comparison of virtualization technologies to determine which is best suited for V-BOINC as well as ad hoc cloud platforms. We then outline the architecture and internal operational processes of V-BOINC while describing its im-plementation details. This includes how we introduce, distribute and operate virtual machines as well as how to ensure virtual machine sizes are kept as small as possi-ble and how to perform automatic checkpointing. This is followed by an evaluation of our V-BOINC platform. Firstly we determine the performance differences between V-BOINC and regular BOINC. We then show the performance of V-BOINC when ex-ecuting SPRINT. Finally, we explore the affects of virtual machine checkpointing on volunteer hosts dependent on the class of scientific application executing.