Chapter 2 Grids, Security & Trust
2.3 Grid Computing & Security
2.3.3 Volunteer & Peer-to-Peer Computing
Volunteer computing systems allow computer users to donate the idle time on their computers to be used for distributed computing tasks. This can be for a specific application, as in the case of SETI@home [102] and the Distributed.net RC5 challenge [57]. With other systems, such as BOINC, based on the SETI@home software, a user can download one piece of software that manages the client’s participation in the distributed computation, and then choose which individual applications to run using it [7]. For example, a BOINC user might contribute CPU time to LHC@home [37], Folding@home [154], or the ClimatePrediction.Net project [45].
With the systems mentioned above the user chooses what applications will be run on their system but otherwise has little control over what happens beyond simple throttles on CPU and network usage. This is understandable as the originator of the computation needs to be sure that the results coming from untrusted systems can be trusted.
There are quite a number of Java-based web volunteer computing systems such as Bayanihan [149], Charlotte [17], and Javelin [43]. Such systems have the advantage that experiment software will be portable to any system running the same volunteer computing software. However, the disadvantage is that the software must be written in Java and this limits the usefulness for running or even linking to existing software written in other languages.
Condor [107, 170] started life as a volunteer computing system, which would allow work to be distributed to idle workstations. Its support for check-pointing allows work to be migrated between hosts when a workstation user begins interacting. Over time Condor has been extended to support dedicated clusters and to allow desktop resources to be integrated with dedicated
Chapter 2. Grids, Security & Trust 2.3. Grid Computing & Security
resources transparently. Condor-G can be used to schedule work to be sent to Globus resources and can manage directed acyclic graph (DAG) workflows with its DAGMan component [81].
Volunteer systems are sometimes used at the organisational level, with forced ‘volunteers’, rather than at desktop level. In this approach large numbers of computer systems in, for example, a university computer laboratory or a commercial office are configured to act as computing nodes. The system configuration is controlled by system administrators and the applications run on the system are determined by those who have been given permission to use the resources.
While the volunteer computing systems mentioned above generally have a manager node sending work to worker nodes, it is also possible to arrange the computation in a less hierarchical fashion. Peer-to-peer computing systems transfer units of work and results between connected peers. The peers may use central discovery services or may be purely distributed.
WebCom
WebCom [120] is an implementation of thecondensed graph model of computation [121]. A con-
densed graph represents the structure of a computation in terms of dependencies between operands, operators and destinations.
Enter − fact × 77 if Exit 1 77 = 1 0
Figure 2.7: Condensed Graph
The recursive CGfactcomputesn!.
Each condensed graph (CG) has anenter node and anexit
node which provide the inputs and collect the output of the
graph, respectively. Each node in the graph has ports for, or
dependencies on, one or more operands, an operator, and one or more destinations. Connections between node may initially bestemmed, meaning that the source of the connection will not
be evaluated until it isgrafted to a destination. In Figure 2.7
the connection between the×node and theifnode is stemmed.
The left-hand-side of the graph, which leads to the×node, will
be grafted and hence evaluated if the result of the = operation is false, otherwise the right-hand-side of the graph (with value 1) will be grafted. The configuration of the graph can allow eager, lazy or imperative scheduling of operations.
WebCom distributes evaluation of condensed graphs over a peer-to-peer network of computers. There is support for fault tolerance, load balancing, and scheduling of nodes. Nodes can be targeted at particular hosts or matched with resources that meet certain requirements by using a ClassAds matching system, originally developed for Condor [143].
WebCom provides a graphical user interface which can be used to create graphs from a palette of existing operations, or from code snippets provided by the user [122].
2.3. Grid Computing & Security Chapter 2. Grids, Security & Trust
Security
WebCom allows for secure communication between instances with authentication based on X.509 certificates and authorisation using KeyNote credentials [73]. The WebCom environment interacts with KeyNote – described below – through its API, meaning that each WebCom graph node does not need to make any security decisions internally. Synchronisation, concurrency and trust management are all handled transparently by the WebCom environment.
WebCom instances are connected in a peer-to-peer topology. Each peer connection entails mutual authentication, by verifying the X.509 public key certificates using the appropriate trusted CA root certificates. Each WebCom instance can perform authorisation when communicating with another instance. In WebCom, authorisation credentials can be granted to permit: submission of a specified graph to another instance; execution of a specified graph received from another instance; execution of a specified operand, return of results to another instance; etc. X.509 certificates and KeyNote authorisation credentials must be created and distributed to WebCom instances. WebCom does not provide user authentication, user authorisation or credential delegation.
WebCom-G
The WebCom-G project [123] aims to combine WebCom with established grid middleware from
the LCG project and to make WebCom more useful as a piece of grid middleware in its own right.8
WebCom, and the condensed graph model it implements, is very suited to creating workflows of grid jobs that themselves can run existing software without the expense of porting to the WebCom condensed graph format. In particular WebCom-G aims to allow submission to multiple grids from a single workflow.
WebCom-G has encouraged research and development of a more general approach to inter- operability between existing grid and workflow middleware. This approach is given the name Metagrid [140]. As part of WebCom-G, WebCom has been enhanced to address some of the secu- rity issues described above. In addition, work has been done in the course of the research for this thesis to bridge the gap in security between WebCom and existing grid middleware. Metagrid and WebCom-G security are discussed in more detail in Chapter 6.