6.3 AppFabric Prototype: Management Plane Configurations
6.3.2 Configuring the AppFabric Application Cloud (AAC)
After the application architect has designed the application and created the AppFabric Ser- vice Workflow (ASW), it is the responsibility of the deployment administrators to deploy the application. Fig. 6.6 shows a schematic representation of the application cloud. Listing. 6.7 provides the configuration of a typical application cloud. There are several deployment policies that the deployment administrators may want to specify.
3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg Cloud Datacenter 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg ISP Network AppFabric Service Workflow
<instance #1>
AppFabric Application Cloud (AAC)
Figure 6.6: AppFabric Application Cloud(reproduction of Fig. 1.4)
Listing 6.7: AppFabric Application Cloud Configuration
1
2 <workflow properties>{
3 ”name”: ”ABC”,
4 ”resource allocation method”: ”greedy max 2 site”,
5 ”avg load per session”: 5,
6 ”deployment sites”:[”US E”,”US W”],
8 ”wf per proxy”: 3,
9 ”overload notification level”: 0.5,
10 ”scale down level”:0.2 }
11 </workflow properties>
Now, let us discuss in some detail what these policies are and how they are specified.
• Policies for distributing the application across multiple datacenters dis- tributed geographically: AppFabric has been designed to support massively dis- tributed application use-cases where the primary motivation for distributing the ap- plication is driven by the need to support di↵erent latency tolerances for the di↵erent services that compose the application. As an example, Internet-of-Things use-cases have many di↵erent distributed data collection and aggregation locations to support a wide range of functions varying between near-real time control to long-term business intelligence. For such applications, the topology of the application deployment (or, geographical footprint) is extremely important. Other benefits of having a distributed deployment such as fault tolerance (both in the application and in the infrastructure) and better resilience to security attacks follow naturally.
3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg 3/13/14 10:09 PM switch Page 1 of 1 file:///users/subharthi/Desktop/switch.svg Zone US-NE Zone US-S Zone US-W Edge Site Core Site
Figure 6.7: Sites and Zones (reproduced from Fig. 5.3)
In AppFabric, each service is associated with a deployement site configuration spec- ified as part of the service configuration (see Listing. 6.2). This attribute is used to place the service either in a EDGE datacenter or a CORE datacenter. Now let us revisit the concept of zones and sites for better clarity of the current discussion. As shown in Fig. 6.7, the deployment manager may divide a large geographical region into several zones. Each zone is independent of each other. For example, as shown in Fig. 6.7, the whole of United States may be divided into three zones - US-NE, US-S, and US-W. Each zone has many di↵erent sites. In our current implementation, sites are classified into two types - CORE and EDGE. The EDGE sites may be small mi- cro datacenters attached to network POPs and operated by the ISPs. These massively distributed micro-datacenter infrastructure is no longer just a concept but is actually being deployed by many carriers such as AT&T and Verizon to drive Network Func- tion Virtualization(NFV) and Internet-of-Things use cases. The CORE datacenters are relatively larger and more centrally located. Each CORE datacenter may support many EDGE datacenters. This distribution-aggregation architecture suits most of the application use-cases that we can envision at present. More intermediary levels may
be needed in the topology, especially when driving applications over very large zones. Our implementation supports only two levels presently and may be extended in the future to add more intermediary levels.
The deployment sites attribute allows the administrator to list the di↵erent zones at which the application needs to be deployed. The application is started simultaneously at all the zones listed in this specification. We will see how the sites within a zone are selected in the following discussion.
• Policies for acquiring the required resources: AppFabric is designed to create an application delivery network (ADN) from resources either owned (enterprise networks and datacenters) or leased from many di↵erent resource providers (Cloud providers and ISPs). However, although AppFabric will dynamically decide (during runtime) which sites to deploy the application on, a list of all the possible sites from which this selection is made is provided in the configuration file⇠/AppFabric/runtime/configurations/sites.cfg. Listing. 6.8 lists the configuration in this file.
Listing 6.8: Sites Configuration
1
2 <site config> 3
4 <zone name=”US E”>
5 <site name= ”DC1”>
6 <site type> CORE/EDGE </site type>
7 .... // authorization/authentication keys
8 .... // billing and other information
9 <site addr> 10.10.1.0 </site addr>
10 </site>
11
12 <site name= ”DC2”>
13 <site type> CORE/EDGE </site type>
14 .... // authorization/authentication keys
15 .... // billing and other information
16 <site addr> 10.10.2.0 </site addr>
18 </zone> 19
20 <zone name=”US W”>
21 <site name= ”DC3”>
22 <site type> CORE/EDGE </site type>
23 .... // authorization/authentication keys
24 .... // billing and other information
25 <site addr> 11.11.1.0 </site addr>
26 </site>
27
28 <site name= ”DC4”>
29 <site type> CORE/EDGE </site type>
30 .... // authorization/authentication keys
31 .... // billing and other information
32 <site addr> 11.11.2.0 </site addr>
33 </site>
34 </zone> 35
36 </site config>
We have already discussed most of these parameters. However, the authorization/au- thentication keys and the billing and other information are not part of the cur- rent implementation and will need to be added in future versions when AppFabric is tested on commercial platforms such as Amazon EC2, RackSpace. etc.
Now, let us see how the platform dynamically selects the sites from this list. The algo- rithm to make this selection is specified by the parameter resource allocation method in the workflow properties configuration (Listing. 6.7). Again, currently we implement a very simple greedy algorithm called the greedy max 2 site. This algorithm greed- ily selects two sites, one EDGE and one CORE, among all the sites that has the required resources to run the EDGE and CORE services in the application service workflow. However, more complex algorithms may need to be designed based on policies for opti- mizing the cost of the deployment, the need for distributing the application to certain
geographical regions, etc. The implementation allows these extensions to be incorpo- rated later. Listing. 6.9, which is in the file⇠/AppFabric/platform/src/lighthouse/globalc/ selectDeploymentSite.py, shows the code snippet that allows the flexibility of attaching di↵erent methods of site selection to the platform. The implementation would be much more robust if it were implemented thorough the factory design pattern and is one of the small changes that the future version of the code should incorporate.
Listing 6.9: Selecting the algorithm for choosing deployment sites
1
2 def selectDeploymentSite( siteList, deployment scenario):
3 if deployment scenario[”WF RESOURCE ALLOCATION METHOD”] == ”greedy max 2 site”:
4 flag, selected coreSite, selected edgeSite = res allocation greedy max 2 site.select site
5 (
6 siteList,
7 deployment scenario
8 )
9 return flag, selected coreSite, selected edgeSite
• Policies for scaling up and scaling down: One of the policies the deployment administrators would like to specify is how and when to scale-up and down. Automat- ically scaling-up and down frees the administrator from continuously monitoring the system and dealing with intermittent periods of high/low load. The parameters that he may set to specify this are instance capacity, overload notification level and scale down level.
– instance capacity: The maximum number of active user sessions that the a workflow instance can handle.
– overload notification level: To account for the delay between signaling an overload condition and the time taken to spawn a new instance, the administrator may specify an overload notification level at which the system starts watching itself intelligently for overload situations and takes pre-emptive actions to avoid it.
– scale down level: The deployment needs to scale down and free the resources when they are no longer needed. This parameter allows the administrator to set a scale down level; which is a measure of the load per workflow instance and the system scales down by shutting down workflow instances till the load per workflow goes above the set value of the scale down level parameter. Note that because of
intrinsic load balancing in the system all the workflow instances at any point of time has equal load.
One of the limitations of the current implementations is that it automatically starts the application with one instance per zone (as specified in the deployment sites parameter). However, there must be more control on this. For example, the application should be allowed to start with more than one live instance and also the deployment administrator should be able to explicitly control how these instances are distributed. This is one of the most urgent features that need to be implemented in a future release of the platform.
• Load Balancing: The platform ensures that user sessions are automatically load balanced across each of the active workflow instances. Load balancing is intrinsically managed in the platform and no external policy interface is exposed to control this. In the future versions it may be useful to allow di↵erent load balancing policies to be specified explicitly as well.
• Fault-tolerance: The system should automatically detect service failures across work- flows and automatically initiate repair e↵orts. This feature is partially implemented. We will see in the next section on OpenADN that the platform implements a elaborate heartbeat mechanism to detect and report failures. However, our current control plane implementation is not mature enough to handle these failures to provide repar- ative actions.