The Grid - Volunteer Computing - Ad hoc Cloud Computing

2.4 Volunteer Computing

2.4.2 The Grid

Due to the similarity between volunteer and Grid computing, the Grid model also offers similar advantages but equal challenges to overcome. Grid computing is a form of distributed computing that combines geographically distributed resources to create a high throughput computing infrastructure [108]. This global infrastructure facilitates the sharing of resources and access to a large-scale computational platform that would otherwise be unavailable. Without the advances of networking technology in the mid-1990s, the Grid would not have been able to provide an effective collaborative data sharing and analytical infrastructure for use by researchers [217].

The Grid infrastructure is typically composed of geographically distributed and voluntary resources provided by an organization, for example a university or business.

Within these institutional boundaries, an organization has the responsibility of provid-ing a scalable, flexible and secure environment for researchers [217]. An organization must conform to a set of open standards and protocols when developing Grid solutions.

For example, the Open Grid Services Architecture (OGSA) exists to define policies of how to share data between various institutional boundaries [30].

Due to the distributed nature of the Grid, the infrastructure is inherently hetero-geneous, loosely coupled and dynamic [63]. However, a shared global platform also fosters global collaboration between organizations to execute tasks which solve prob-lems to reach common goals; a primary reason why Grid computing was born [157].

Grid infrastructures can execute a variety of tasks from many research project areas such as high-energy physics, bioinformatics and chemistry, for example. These

appli-2.4. Volunteer Computing 33

cations may be also be distributed in nature and require high-throughput or fast data processing capabilities. A well cited use of Grids is the work being performed at CERN to help analyze and store the vast amounts of data produced from the Large Hadron Collider (LHC) experiments [217, 109, 157]. Data is transferred over the World LHC Computing Grid (WLCG) [44] from Geneva to various organizations called Tiers; an example of the scale of data transfers is shown in Figure 2.7.

Figure 2.7: WLCG Data Transfers [43]

Three types of tiers exist: Tier-0, Tier-1 and Tier-2 [114]. The Tier-0 centre consists of the shared infrastructure available at CERN and has the purpose of data recording, performing initial analyses and distributing data from their experiments. Tier-1 cen-tres store data, perform large-scale preprocessing and store subsequent results. Tier-2 centres are typically universities and organizations who can store and analyze small amounts of data. Figure 2.7 shows that at the time of writing, the aggregated band-width usage was 5.32 GB/s and this data was being sent from CERN to 106 sites over 1069 links. The coordination of data and distributing computations over the Grid infrastructure are a few of the many technical challenges that must be overcome to successfully operate over the Grid.

Many of these challenges are solved by Grid middleware such as Globus [78, 107], gLite [149, 159] and Condor [154, 204]. However these middleware frameworks can be complex and target the use of an organization’s dedicated resources within the Grid.

There are however substantial idle resources within organizations that could be uti-lized, hence Grid middleware such as BOINC [47] and Xtermweb [100], as well as Condor aim to take advantage of these resources to create a Desktop Grid.

While conceptually similar to volunteer computing, Desktop Grids are composed of a more homogeneous set of resources and are either under the same ownership or owners agree to common management policies to achieve a goal. On the other hand, volunteer computing is composed of a wide range of heterogeneous resources dispersed worldwide that are unreliable in nature. We now focus on one Desktop Grid middleware and volunteer computing system that is core to our ad hoc cloud computing model.

2.4.3 BOINC

The Berkeley Open Infrastructure for Network Computing (BOINC) is an open source client-server middleware system created to allow projects with large computational requirements, usually set in the scientific domain, to utilize a technically unlimited number of volunteer machines distributed over large physical distances [47]. Created in 2002, BOINC has become one of the most popular volunteer computing middleware systems.

2.4.3.1 Overview

The success of BOINC can be attributed due to its simplicity and ease of use from a volunteer user’s perspective as well as its architecture in general. BOINC follows a basic client-server model. Volunteer users must download BOINC and select or enter their desired project in order to obtain tasks from the appropriate BOINC server;

there are very few actions that must be performed afterwards and BOINC can execute indefinitely without user intervention. The BOINC architecture is shown in Figure 2.8.

Two important components are depicted: the BOINC client and the BOINC server.

The BOINC client is an application that is installed on the volunteer host and has the purpose of communicating with the server, attaching the client to single or multiple projects, organizing the computation and returning results. The BOINC client is com-posed of four components as shown in Figure 2.8 [49]:

• The core client: communicates with the server, attaches clients to projects, orga-nizes the computation, executes the application and returns the result.

• The boinccmd API: a command-line interface for controlling the core client. It is able to obtain new tasks, suspend computations, upload results, reset the project, etc.

2.4. Volunteer Computing 35

BOINC client Volunteer User Project Administrator

Request Job Get Job Results View Account ACK Admin Project ACK Archive

core client Manager

boinccmd Screensaver

Figure 2.8: BOINC Architecture; derived from [98]

• The BOINC Manager: a Graphical User Interface (GUI) representation of the boinccmd API. The Manager also shows all attached projects, current down-loads, computational progress, etc.

• Screensaver: a project specific screensaver displaying graphics of a running task;

whether a screensaver exists is project dependent.

2.4.3.2 The BOINC Process

Upon running the BOINC client for the first time, a series of benchmarks are executed to determine the true speed of a host’s CPU. The total resource capacities and available disk space are also recorded. Once connected to a scientific project, the BOINC client will receive an application from the BOINC server to execute.

The application itself typically consists of an application executable, that has previ-ously been compiled on the target host type, and a series of input and output files [79].

The application must have checkpointing measures in place to allow the computation to continue if BOINC is quit by the user or the host terminates or fails [49]. During the

execution of an application, the BOINC client records the amount of work performed by the volunteer host and issues credits to the user which are published on-line. Credits are calculated by multiplying the application’s CPU time by benchmark scores.

Conceptually the BOINC client is a simple application however much of the sys-tem’s complexity resides on the BOINC server. The BOINC server has the main pur-pose of hosting the scientific project (e.g SETI@Home) and creating, distributing, col-lecting, storing and validating Results from many clients [111]. Results are instances of a particular BOINC Work Unit (i.e. a particular scientific task) regardless if the Work Unit has been completed or not. To store and distribute these Results, the server uses MySQL for data storage, while Apache and PHP are used for web access; for example, to allow a volunteer user to modify project preferences or a project adminis-trator to configure the project.

The BOINC server is underpinned by a set of running daemons that create and coordinate entities related to the project [51]. A set of default daemons are provided however additional daemons can be added dependent on the project characteristics and functionality required. After an application developer has created their scientific project, and as shown in Figure 2.8, the work generator daemon begins creating project Work Units and stores these in the ‘Download’ folder. The transitioner, whose task it is to manage the state transitions of Work Units and Results, then generates multiple Results from a single Work Unit and stores these in the database.

The feeder periodically extracts these Results and enters them into a shared mem-ory region. The scheduler, which has the purpose of communicating with client us-ing XML messages, coordinates outbound Results from the shared memory region to clients while concurrently dealing with completed Results. Received Results are placed in the ‘Upload’ folder and the transitioner is informed. The validator is then instructed to validate the received Results. BOINC does this by adopting replication where each job is executed on multiple hosts. By comparing the Results received from different clients, BOINC ensures that host errors or security breaches have not influ-enced the Results. Credits are issued to hosts only if the Result is deemed as valid [51].

Once a Result has been validated, a Canonical Result is created; a Result which is the simplest and best of those validated. Optionally, the assimilator may perform an administrator-defined action such as archiving Canonical Results to long-term storage.

In order to reduce storage space consumption on the server, the file deleter removes Work Unit data files and Results that are no longer required.

2.4. Volunteer Computing 37

2.4.3.3 The Performance of BOINC

In order to attract and retain volunteer users, the performance of both the BOINC server and client must be acceptable. The BOINC client should not cause significant overheads or slowdown of the volunteer host and the processes currently running on it.

This however is managed by the volunteer user via project preferences.

The volunteer user is able to control aspects of the job by adjusting these pref-erences via their account hosted on the BOINC server [49]. They can control the minimum interval between checkpoints, the maximum utilization of a processor, the total disk, memory and network usage allowed or even the time of day the volunteer host can be used; many other options exist but for brevity, are not explained here. We show in Chapter 4 that no significant overheads exist while executing an application using BOINC.

As most of the complexity of the BOINC system resides on the server, achieving good performance is critical to meet the demands of BOINC clients. As the number of volunteer hosts can range from tens of volunteers to potentially hundreds of millions [51], it is especially important that the server scales well when there is an increase in the number of client requests for work or Results uploaded.

Anderson in 2005 [51] performed an analysis of the BOINC server performance and found that an inexpensive computer ( 2GB RAM, 2 x 2.4 GHz processors and 480 GB storage) hosting the server can distribute approximately 8.8 million tasks per day.

Excluding file upload and download, which is project dependent, a network offering approximately 8.2 Mbps would be needed to cope with this number of tasks. The main performance bottleneck was the CPU which reduced database performance and limited the number of tasks per day that could be distributed.

Amdahl’s Law dictates that CPU speeds double approximately every 18 months, hence nowadays, we would expect the BOINC server to process and distribute more tasks per day. Hence, based on these measurements, it is reasonable to assume that any modification to the BOINC server would have little effect on performance, which we show in Chapter 6, and the ability to achieve 8.8 million tasks per day.

However, as the trend in CPU speed progresses, the available network bandwidth will become the bottleneck, especially in the future as applications become more data-intensive. The number of volunteer resources may also have to increase substantially in order to cope with large storage demands if disk technology and performance does not keep pace with the increase in CPU speed.

2.4.4 Grids, Clouds and Volunteer Infrastructures

Now that we have described all computational models that are either directly or indi-rectly related to ad hoc cloud computing, we offer a brief comparison between Grid, cloud and volunteer computing.

Grid computing has provided cloud computing with fundamental aspects of dis-tributed computing enabling it to thrive in recent years. They both share common entities such as being available via an Internet connection and offer geographically distributed resources [108, 56]. They are also scalable, created for multi-tenancy and both are trusted to provide reasonable performance and security.

Clouds and Grids are however different in many ways. Perhaps the most signif-icant difference is the use of virtualization in cloud infrastructures for reasons previ-ously mentioned. Furthermore, cloud mandates the use of virtualization whereas Grid permits it but does not require it. Therefore applications do not need to be modified for use on the cloud as a virtual machine can be modelled on an end-user’s OS and local resources. On the other hand, Grid does require applications to be modified and submission scripts must be created to execute the application.

Another significant difference is that Grid computing components are owned by a consortium of organizations who agree to conform to a common implementation op-eration and use model, whereas cloud computing infrastructures are owned and main-tained by a single organization that chooses its own model. Although a degree of trust exists between both the users of cloud and Grid infrastructure providers, the core concept of Grid builds upon stronger levels of trust between organizations in order to foster collaboration. This was highlighted when Ashley et al. were the first to con-nect organizations from three continents into a single large-scale research Grid in the Asia-Pacific region [155].

As Grid computing was built on the premise of data sharing and collaboration, cloud computing is typically known to offer a commercialized version of this compu-tational model and is targeted at businesses rather than researchers. This has resulted in cloud infrastructures becoming service orientated for business requirements, whereas Grids that offer service-based functionality, are typically aimed at scientific research or large-scale computations; for example, the WLCG. Furthermore, HPC applications are less well suited to running effectively on the cloud, however they are suitably matched to the Grid. The location of data is also unknown when utilizing commercial and pri-vate cloud infrastructures.

In document Ad hoc Cloud Computing (Page 46-53)