• No results found

5.3 Architecture

5.3.3 PMWComm

Nowadays many computational clusters are configured with only one IP visible frontend that hides all its internal hosts from the external world. In these architectures, it is difficult to exploit the hosts since they are located in a different network domain that is geographically and administrative separated, usually shielded by firewalls. In addition, some of the clusters are configured to use network address translation (NAT) and have private networks that are not accessible from external hosts. In order to leverage any

5.3. Architecture 57 hosts on the Internet to build wide-area virtual machines to serve applications, we use the PMWComm daemons which are placed on the frontend host of each domain as proxies to route messages and control processes (Service 5). Since the local host always belongs to a different domain from the resource domains, we also place a PMWComm on the local host to route messages between the host and remote resources.

If communication is cross-domain (the sender and the receiver have different domain ID), the sender first needs to extract the domain ID from the receiver’s UPID and then to contact the receiver associated PMWComm to ask it to route the message to the receiver’s LComm, which will finally accomplish the message passing (Figure 5.5).

Another responsibility of PMWComm is to take charge of process management inside the local domain. This includes deploying new processes and killing unnecessary processes. When a process requests to add a new process by AddProcess(Name of executable), AA library first converts this request to a resource request with resource requirements provided by the related XML file, and passes it to the RD daemon, which will then look for a resource. Since the found resource may belong to a domain that is different from the requestor, the process deployment would involve one more PMWComms. Once a qualified resource is found, RD sends the resource information including its IP address with the original process addition request to the resource associated PMWComm. The PMWComm then pulls the process’s binary to the destination resource and starts the intended process remotely. Once a process has been started successfully, the PMWComm assigns a new UPID including the domain ID to the new process and stores the process information into the local process table. Finally, the PMWComm reports the new process’s information to the requestor. This is done by sending the information to the PMWComm associated with the requestor, which in turn will pass the information back to the requestor when the GetNotification() is activated (Figure 5.6).

Upon receiving a process killing request, the AA library delegates the request to the PMWComm associated with the destination process. The PMWComm will finally accomplish the process release. The related host will be returned to the cluster pool.

Since resources in a virtual machine are dynamically discovered, the AA cannot know in advance which clusters it will use during the execution. PMWComms are therefore spawned on-demand by the RD daemon onto the frontend host of domains when it decides to use the hosts from those domains. Each PMWComm is assigned a domain ID starting from 0. When a new PMWComm is spawned, its information will be added into the domain table in the NameServer component(Section 5.3.5).

5.3.3.1

Multiple PMWComms and Domain Tree

Since some domains contain a number of subdomains, each of which has separate firewalls, multiple PMWComms need to be placed to connect to the computing hosts. For example, the CS department of University College London has a firewall and only configures a publicity IP visible frontend named amy.cs.ucl.ac.uk for external machines to access. The HPC cluster in the department is configured by network address translation (NAT) and has a frontend morecambe.cs.ucl.ac.uk. In order to access the cluster hosts from a machine outside the CS department, PMWComms have to be placed on the frontend

5.3. Architecture 58 Process A LComm Process B LComm Frontend Domain 1 Domain 2 PMWComm Frontend 1 2 3 4 5 6 7 8 PMWComm

Figure 5.5: The cross-domain communication is delivered by PMWComms. The firewall in each cluster is assumed only to restrict inbound traffic, therefore the sender LComm can directly talk to external PMWComms without contacting its local PMWComm.

P M W C o m m P 1 RD P M W C o m m new P 1 2 3 4 5 dom ain1 dom ain2 found res ourc e A

A

Figure 5.6: A process is added into a different domain for process P1, which is achieved by two PMW- Comms. 1: request a resource. 2: delegate the process addition request to the resource associated PMW- Comm. 3: remote process start. 4: pass the new process information to the PMWComm associated with the requestor. 5. the PMWComm notifies P1 the new process has been added.

5.3. Architecture 59 Process LComm PMWComm cs.ucl.ac.uk 80.88.192.115 amy.cs.ucl.ac.uk Process LComm PMWComm morecambe.cs.ucl.ac.uk harpo-9-58 HPC cluster

Figure 5.7: Place multiple PMWComms to connect multiple domains

am y . c s . u c l. ac . u k g atew ay . ic . ac . u k P UBLI C m o r ec am b e. c s . u c l. ac . u k c o n d o r . c s . u c l. a c . u k

P ublic Inte r ne t do m ain U C L C S do m ain

H P C do m ain C o ndo r do m ain

IC do m ain

Figure 5.8: A Domain Tree.

of the CS department and also on the frontend of the HPC cluster (Figure 5.7).

One of our objectives is to enable an application to be started from any machines on the Internet and the computing environment to be automatically configured (Service 3). This requires that the AA is able to find the path to connect to the resource domain and place PMWComms on the relative frontends, regardless of where the application starts. We therefore introduced the Domain Tree (DT) (Figure 5.8) to store the topology of the domain connections.

Each node in the DT is a frontend address representing a domain. The root node is PUBLIC, which represents the public Internet domain that everybody has authority to access. A node may have one or more child nodes, meaning that the related domain may have one or more subdomains. Based on the DT, a fundamental algorithm (algorithm 2) is introduced to find the route to connect hosts of any two domains.

Take the DT in Figure 5.8 as an example, we start an application in the IC domain and the AA dis- covers resource in the HPC domain. According to the algorithm, the AA finds the path {amy.cs.ucl.ac.uk, morecambe.cs.ucl.ac.uk} to connect to the hosts from the IC domain. It then places two PMWComms on those frontends in the path. The message passing from the IC domain to the HPC domain is then achieved by the requestor → P M W Comm on amy → P W M Comm on morecambe → receiver chain.

5.3. Architecture 60 Algorithm 2 Algorithm to find the connection route in a DT.

Assume: Firewalls only restrict incoming traffic.

Let: The start host belongs to domain A, frontend α, the destination host belongs to domain B, frontend β

Aim: To find the path ρ from the start host to the destination host. Algorithm:

1. In DT, find the mutual ancestor node γ of α and β 2. calculate δ = β.depth - γ.depth

3. ρ = {β.parent[δ−1], β.parent[δ−2], ..., β.parent, β}

Note message passing from the HPC domain to the IC domain needs involvement of other PMWComms. Users can customize the DT and potential resource pools via AA’s console (Appendix B.4).