Cloud Network Infrastructure as a Service:
An Exercise in Multi-Domain Orchestration
Jeff Chase
Aydan Yumerefendi
Duke University
Ilia Baldine
Yufeng Xin
Anirban Mandal
Chris Heerman
Renaissance Computing Institute (RENCI)
David Irwin
University of Massachusetts Amherst
Abstract
Cloud computing is now a successful and well-understood example of the Infrastructure as a Service (IaaS) model. This paper explores how to extend IaaS clouds to other kinds of substrate resources beyond servers and storage, and to link these elements together in a coordinated, multi-provider “web” of cloud infras-tructure services. The vision is to enable cloud appli-cations to request virtual servers at multiple points in the network, together with bandwidth-provisioned net-work pipes and other netnet-work resources to interconnect them. We outline new software to orchestrate end-to-end connections over multi-layer networks, coordinated with Eucalyptus clouds and other resources at the edge. We present results from a demonstration experiment with the prototype, and discuss various architectural challenges arising in multi-domain cloud computing with dynamic circuit networks.
1
Introduction
EC2 and other server clouds follow an “Infrastructure as a Service” (IaaS) model, in which the cloud customer rents virtual servers and selects or controls the software for each virtual server instance. Cloud computing is now a successful and well-understood example of IaaS. For example, clouds are gaining acceptance as a simple and powerful vehicle to scale up computing power for sci-ence [9, 19].
This paper explores how to extend the IaaS vision to enable coordinated access to diverse resources from multiple autonomous resource providers. For example, cloud users may wish to spread their usage across mul-tiple cloud providers to improve scaling or control de-pendency risk, or for geographic dispersion. We also ad-vocate extending the cloud abstraction to other kinds of substrate resources beyond servers and storage, includ-ing cloud networks.
Several efforts are building support for dynamic cir-cuits on national-footprint multi-layer networks (Na-tional Lambda Rail, Internet2, ESNet), including
inter-domain circuits that span more than one of these
net-works. Technologies to virtualize networks continue to advance beyond the VPLS/VPN tunneling available to the early cloud network efforts [11, 18, 23, 25, 21, 1, 12]. Advanced multi-layer networks offer direct control of the network substrate to instantiate isolated virtual pipes, which may appear as VLANs, MPLS tunnels, or VPLS services at the network edge.
As a first step to linking these resources to clouds, we have developed a software prototype to instantiate dy-namic circuits in tandem with virtual machine instances to interconnect cloud applications across multiple cloud sites and domains. The prototype includes extensions to Eucalyptus, a commercially supported open-source cloud infrastructure service. It uses the ORCA orchestra-tion and network control software (Open Resource Con-trol Architecture), which derives from almost a decade of research in networked clouds [7, 13, 17, 22, 6, 5, 4, 14]. It adds plug-in control modules for ORCA to interface to various substrate providers, including Eucalyptus cloud sites, NLR’s Sherpa FrameNet service, and the Break-able Experimental Network (BEN), a metro-scale optical network testbed operated by RENCI in North Carolina.
Our vision is to enable cloud applications to request virtual servers at multiple points in the network, to-gether with bandwidth-provisioned network pipes and other network resources to interconnect them. The GENI initiative (Global Environment for Network Innovation, funded by the US National Science Foundation) is pur-suing a similar vision for a specific use case: link-ing testbeds for research in network science and engi-neering. The principal goal of GENI is to enable re-searchers to experiment with radically different forms of networking within private isolated “slices” of shared testbed resources offered by a federation of providers. A GENI slice gives its owner control over a set of
virtu-alized substrate resources allocated from the providers, which may include programmable network elements, mobile/wireless platforms, and other infrastructure com-ponents as well as virtual servers, storage, etc. The slices are built-to-order for the needs of each experiment through a GENI control framework. ORCA is a candi-date control framework for GENI.
The GENI vision and development effort is an im-portant direction for the future of cloud computing. Our project is one of several GENI Spiral 2 projects linking server clouds into GENI. More importantly, GENI addresses key challenges for federating clouds and deploying cloud applications across multiple in-frastructure providers, and extending the inin-frastructure- infrastructure-as-a-service vision to orchestrated control of
perva-sive virtualization—flexible, automated configuration
and control of distributed cyberinfrastructure resources. This paper outlines some design challenges and choices of the software, and reports on a demonstration experiment. Following GENI, we refer to the virtual re-sources assigned to a distributed cloud application as its
slice of the shared substrate [13, 8]. The resources in
the slice may be obtained from multiple providers, in-cluding but not limited to cloud providers and network transit providers. The control and orchestration frame-work must link or “stitch” these elements into a slice with suitable end-to-end connectivity, and coordinate al-location of resources from the shared substrate across competing workloads (e.g., co-scheduling). In the ex-periment, ORCA services coordinate co-scheduling, dy-namic instantiation, and interconnection of virtual server instances and dynamic network circuits to create a seam-less end-to-end slice with a VLAN spanning multiple clouds.
Our approach uses a semantic web ontology to de-scribe the cloud substrates and the elements of each slice. These declarative representations expose suffi-cient information to enable a general-purpose substrate-independent core to automate slice stitching across mul-tiple autonomous providers. An orchestration server (an ORCA slice manager) consumes these representations and drives stitching by sequencing the flow of secure to-kens among ORCA servers operated by the providers. When the slice is ready, it launches the application into the slice. The orchestration server requires no special privilege: it may be controlled by the customer or oper-ated as a service by a third party.
2
Overview and Design
Infrastructure-as-a-service is based on virtualization technologies that offer common advantages across dif-ferent kinds of substrate. They give providers rich mech-anisms to control who has access to their substrate
re-sources, and when, and on what terms, often including assurance of predictable service quality levels needed for certain mission-critical customers. The customer has full control over how it uses those virtual resources once the provider assigns them to the slice. Virtualization offers some degree of containment and isolation (safety and pri-vacy) for slices hosted on a shared substrate.
For example, VM instances can be sized for an plication and loaded with an operating system and ap-plication stack selected by the user. There is a rich set of tools to construct these packaged software stacks as
virtual appliance images, and a wide range of
prepack-aged images are available from third parties. Once an im-age exists, users can deploy it at wide scale on cloud re-sources from multiple providers, without requiring those providers to support the specific software stack. Users may change their software stacks without involving the cloud providers or affecting other users.
Similarly, built-to-order virtual networks are suitable for provisioning flexible packet-layer overlays using IP or other protocols selected by the owner. IP overlays may be configured for secure isolation, or linked with routed connections to the public Internet through gateways and flow switches. VM instances in the cloud can plug into multi-layer networks at any layer: different layers offer different services, quality-of-service profiles, and isola-tion properties. Amazon’s recently introduced Virtual
Private Cloud service is an example of the power of
link-ing configurable networks to the cloud.
These common benefits of virtualization, and the op-portunities to link virtual resources together, motivate us to take a comprehensive view of IaaS spanning multi-ple cloud providers and substrate types. This view raises many interesting new architectural issues. How to de-scribe, package, and deploy future multi-domain cloud applications? User-driven tools may select the target for each component at deployment time; this late binding to the target cloud may require the tool to modify and/or register images before launch. Orchestration software is needed to federate clouds across domains, coordinate image registration, resource allocation, stitching, launch, monitoring, and adaptation for multi-domain cloud ap-plications. Doing this right requires solutions for iden-tity, authorization, monitoring, and resource policy spec-ification and enforcement. ORCA uses open interfaces and an extensible plug-in architecture to enable these so-lutions to evolve over time and leverage and interoperate with software outside of ORCA.
2.1
Linking Clouds through Multi-Layer
Networks
A principal goal is to link dynamic circuit networks to cloud sites. Amazon’s Elastic Compute Cloud (EC2)
NLR FrameNet + Sherpa StarLight BEN@RENCI NOX UMass 6509 65xx NLR 6509 BEN@Duke 6509 65xx NLR iGENI EX3200 VM Dome ViSE Dynamic VLAN Static VLAN ORCA-controlled substrate VM DukeCS BEN
Figure 1:Elements of the slice created in the demonstration experiment, and their linkages.
is a popular commercial offering for a “public” IaaS cloud. The EC2 API allows customers to request, con-trol, and release virtual server instances on demand from Amazon-operated servers. Pay-as-you-go pricing helps EC2 users adapt nimbly to changing demands with min-imal capital cost [3].
Private clouds offer the same opportunity for flexible, controlled sharing and agile management of resources. Eucalyptus is a leading open-source software technology for private clouds: it offers an EC2-compatible API to provision and program groups of VMs across a range of underlying hypervisor systems.
Duke and RENCI maintain private Eucalyptus clouds linked through the BEN network. The BEN PoPs use Cisco and Juniper routers above WDM/TDM bandwidth virtualization technology from Infinera; BEN offers 10Gbs dynamic circuits as well as dedicated fiber access through dynamically configurable optical switches (Po-latis). BEN has 10Gbps linkages to NLR FrameNet and is adding other connectivity to other national-footprint networks through a connection to the StarLight network hub. BEN is operated by RENCI and is externally con-trollable through ORCA.
Each Eucalyptus site owner runs a local ORCA server endowed with rights to invoke Eucalyptus APIs on behalf of remote users whose identities might be unknown to the provider. The ORCA service incorporates policies to authorize these users and arbitrate among contending requests. The cloud site owner may select the policies that govern access to each site. These policies may be quite different from the pricing policies used in a public cloud such as EC2.
The ORCA per-site server is mostly generic, but it runs plugin scripts to invoke the Eucalyptus APIs.
Co-scheduling is performed by ORCA brokering intermedi-aries, which have limited allocation power delegated to them by the sites using mechanisms described in previ-ous work [17, 13].
Eucalyptus supports an EC2 abstraction called
secu-rity group that offers containment and connectivity
fil-tering for a user’s VM instances. The group has an iso-lated VLAN with a per-site IP/NAT gateway with custom filtering rules for external connectivity. We have proto-typed Eucalyptus 1.5.2 extensions that enable the transit provider to link these group VLANs to external network circuits provisioned from BEN. This linkage can occur at multiple layers. Our prototype makes the linkages at layer 2: the BEN PoP switch (Cisco 6509) maps the group’s VLAN tag onto the pipe, effectively joining the VM instances in the group to a private VLAN that may span multiple sites. A private cross-site VLAN has var-ious benefits: for example, it enables migration of VM instances across sites, and allows the user to modify the network protocol stack.
Various policies and conventions are necessary to co-ordinate naming at each layer. For example, for layer 2 connectivity, Eucalyptus already assures uniqueness of MAC addresses assigned to VM instances, so ARP and spanning tree functionality work properly. However, ORCA must coordinate IP addresses on the slice VLAN to avoid collisions. The VM instances are created with an identity and keys held by the local ORCA server so that it may connect to the instances and configure them. When it is done, ORCA installs the user’s public key in the instance and generates a notification to transfer con-trol to the user, or to an orchestration server running on the user’s behalf.
Request to ViSE immediately
Request NLR/Sherpa link to
Starlight immediately
When NLR/Sherpa path is ready, stitch
one end to ViSE through Starlight…
VLAN tag through
DukeNet to BEN
Start Duke Eucalyptus when DukeNet
VLAN tag is known
…stand up BEN path
and stitch to Sherpa
path at one end, and
to Duke Eucalyptus
VM on the other.
Figure 2:Instantiation schedule and completion times for elements of the demo slice. This figure is generated from timestamped lease event traces collected from the ORCA server logs at each of the providers.
2.2
Network Control
Multiple high-speed national fabrics (NLR, I2, ESNet, others) offer resource reservation mechanisms with dif-ferent levels of abstraction. Some now offer automated control planes (NLR Sherpa [24], I2 ION, ESNet OS-CARS [15]) and inter-domain provisioning mechanisms (I2 DCN, GLIF Fenius [10]) with external APIs. These mechanisms make it possible to provide varying levels of quality of service to meet a range of needs. The most common abstraction offered by the national fabrics today is a VLAN—a tagged Layer 2 circuit with possible band-width guarantees that can be carried from one interface of the fabric to another.
Network embedding is a multi-dimensional optimiza-tion problem for which the inputs are the current state of the substrate, availability and compatibility of differ-ent interconnect technologies at the participating sites, and the requested topology of the slice and its Quality of Service profile. ORCA provides a pluggable interface for such policies. Our prototype policy grants any re-quest for which it can identify a feasible embedding, as described below.
A local ORCA domain server runs for each network transit provider. As with the ORCA server at each Eu-calyptus site, these servers are generic except for plu-gin scripts matched to the specific domain. For
ex-ample, an ORCA server directly controls the BEN net-work, and runs plugins that emits configuration com-mand sets to software drivers we developed for the na-tive TL-1 interfaces of the fiber switches (Polatis) and WDM DTNs (Infinera), and the CLI interface of the Eth-ernet switches (Cisco). Another ORCA server runs dif-ferent plugins that invoke the Sherpa API [24] to config-ure layer-2 FrameNet paths through the National Lambda Rail (NLR).
In the future, the degree of automation of networks will increase. Networks will expose their capabilities and will allow attachment of cloud edge resources at differ-ent layers, including transport technologies like OTN or SONET). These layers will carry encapsulated traffic of cluster interconnects between different sites. Today com-mon cluster interconnects are IP over 10G Ethernet or In-finiband. GLIF (Global Lambda Interchange Facility) is automating GOLE (GLIF Opel LightPath Exchange) op-erations to allow provisioning of global WDM lightpaths that are largely agnostic to the payloads they carry.
2.3
NDL-OWL
One focus of the project is to advance standards and representations for describing network cloud substrates declaratively. There is a need for a common declarative language that can represent multi-level physical network
substrate, complex requests for network slices, and the virtualized network resources (e.g., linked circuits and VLANs) assigned to a slice.
Our approach builds on the Network Description Lan-guage (NDL [16]). NDL has been shown to be useful for describing heterogeneous optical network substrates and identifying candidate cross-layer paths through those networks. We extended NDL to use a more powerful on-tology defined using OWL (Web Onon-tology Language). OWL is a core technology for the Semantic Web, and a widely used W3C standard [2]. Semantic Web on-tologies are especially suitable to model graph structures such as complex network clouds.
The result is an NDL-compatible extension of NDL which we refer to as NDL-OWL. NDL-OWL represents various substrate-specific constraints for allocation, shar-ing, and stitching. These constraints are crucial for the resource control plug-in modules in ORCA, which are responsible for allocating and configuring substrate re-sources for each slice.
The ultimate goal of this process is to create a repre-sentation language that is sufficiently powerful to enable generic resource control modules to reason about sub-strate resources and the ways that the system might share them, partition them, and combine them. Ideally, we could specify all substrate-specific details declaratively, so that we can incorporate many diverse substrates into a network cloud based on a general-purpose control frame-work and resource leasing core.
For example, our prototype identifies feasible network embeddings in the BEN network by graph queries on the NDL-OWL representation of the current state of the sub-strate. These queries use a standard semantic web query language (SPARQL) and query engine (Jena). The result graph is processed to generate a schedule of configura-tion acconfigura-tions that instantiate the embedding on the BEN network.
2.4
Stitching
A major challenge is to stitch different elements of a slice together to establish end-to-end connectivity within the slice. In networks, various labels are used to isolate and identify logical network channels (e.g. VLAN tags) or physical channels (e.g., frequency). Other substrate el-ements use similar labels, such as logical unit numbers (LUN) in storage systems. Stitching involves exchang-ing these labels across logically neighborexchang-ing elements or the slice. We generalize the stitching problem as a label producing and consuming problem based on the relation-ship among neighboring elements. These relationrelation-ships create a dependency DAG that defines a partial order for the operations to instantiate the slice elements and stitch them together.
For example, VLAN tags generally must be mapped or translated at stitching points to establish an end-to-end VLAN across multiple network domains. Various standards have been developed to facilitate Ethernet la-bel switching, but there are many technical obstacles to adoption [20] and they are not yet widely deployed. Some major Ethernet service providers assign an arbi-trary VLAN ID from a range (e.g., NLR Sherpa). Our prototype maps VLAN tags at the BEN edge, and in an ORCA-controlled Ethernet switch at the Starlight net-work hub in Chicago, which links to BEN through NLR VLANs provisioned with Sherpa.
In our approach to stitching, a broker intermediary re-turns NDL-OWL descriptions of slice elements reserved to the slice, including the label produce/consume behav-ior each provider domain. Each domain describes the following attributes: (1) Label type; (2) if it is a label producer; (3) if it has a label translation capability. An ORCA slice manager collects this information and gen-erates a DAG encoding the flow of labels and resulting instantiation order and stitching dependencies. The slice manager uses this DAG to sequence its interactions with the ORCA servers representing each of the provider do-mains. Each domain signs any labels its produces, so downstream providers can verify their authenticity using the common broker as a trust anchor. The ORCA slice manager runs on behalf of the user and is not trusted by either the broker or the providers.
3
Experiment
We used the prototype described here to demonstrate an end-to-end slice linking a Eucalyptus cloud site at Duke with the ViSE testbed at U. Mass Amherst through an end-to-end VLAN spanning BEN and a dynamic NLR/Sherpa circuit through the Starlight network hub in Chicago. The ViSE testbed is linked to Starlight through a static VLAN. This VLAN is stitched to the Sherpa path through an ORCA-controlled switch maintained at Starlight by the iGENI project. Figure 1 depicts the ele-ments of the slice and their relationships.
The experiment instantiates a VM instance at the Eu-calyptus cloud site and a Xen VM from the ViSE testbed, and links them through the dynamic stitched VLAN. It then launches an Apache Web server on its Eucalyptus node, which provides an interface to process and visual-ize radar data fed through the circuit from the ViSE node. We ran this demo live at the GEC7 Conference in March 2010. Figure 2 depicts the instantiation of the slice from a test run of the demo. The orchestration server initiates the “center” (NLR Sherpa) and “edges” (Eucalyptus and ViSE VMs) of the demo slice immedi-ately. When the NLR path is ready and its VLAN tag is known, it commands the Starlight switch to stitch the
path through to ViSE, and initiates a BEN circuit to con-nect the other end through to the Eucalyptus VM instance group VLAN at the Duke site. The end-to-end slice is ready in four minutes.
4
Conclusion
The demo serves as a proof of concept to show how cloud applications can interconnect and link to other re-sources through dynamic circuits provisioned along with the VMs by a cloud orchestration framework. Certain security mechanisms were disabled for the practical de-mands of a live demo, important and difficult policy questions are left unaddressed, the network embedding instance is relatively trivial, and the Eucalyptus integra-tion in the prototype is at best a proof of concept. How-ever, the important orchestration code is automated as outlined in this paper, and not bound to this specific sce-nario.
Acknowledgement. This work was supported by
the National Science Foundation GENI Initiative, NSF awards CNS-0720829 and CNS-0910653, and an IBM Faculty Award.
References
[1] S. Adabala, V. Chadha, P. Chawla, R. Figueiredo, J. Fortes, I. Kr-sul, A. Matsunaga, M. Tsugawa, J. Zhang, M. Zhao, L. Zhu, and X. Zhu. From virtualized resources to virtual computing grids: the in-vigo system. Future Gener. Comput. Syst., 21(6):896–909, 2005.
[2] G. Antoniou and F. Harmelen. A Semantic Web Primer. MIT Press, 2008.
[3] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Above the Clouds: A Berkeley View of Cloud Com-puting. Technical Report UCB/EECS-2009-28, EECS Depart-ment, University of California, Berkeley, Feb 2009.
[4] I. Baldine, Y. Xin, D. Evans, C. Heermann, J. Chase, V. Marupadi, and A. Yumerefendi. The Missing Link: Putting the Network in Networked Cloud Computing. In ICVCI: International Con-ference on the Virtual Computing Initiative (an IBM-sponsored workshop), 2009.
[5] J. Chase, I. Constandache, A. Demberel, L. Grit, V. Marupadi, M. Sayler, and A. Yumerefendi. Controlling Dynamic Guests in a Virtual Computing Utility. In International Conference on the Virtual Computing Initiative (an IBM-sponsored workshop), May 2008.
[6] J. Chase, L. Grit, D. Irwin, V. Marupadi, P. Shivam, and A. Yumerefendi. Beyond Virtual Data Centers: Toward an Open Resource Control Architecture. In Selected Papers from the In-ternational Conference on the Virtual Computing Initiative (ACM Digital Library), May 2007.
[7] J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. E. Spren-kle. Dynamic Virtual Clusters in a Grid Site Manager. In Pro-ceedings of the Twelfth International Symposium on High Perfor-mance Distributed Computing (HPDC), June 2003.
[8] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson, M. Wawr-zoniak, and M. Bowman. Planetlab: an overlay testbed for broad-coverage services. SIGCOMM Comput. Commun. Rev., 33(3):3– 12, 2003.
[9] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good. The Cost of Doing Science on the Cloud: The Montage Example. In Proceedings of SC’08, Austin, TX, 2008. IEEE.
[10] G. FENIUS.http://code.google.com/p/fenius/.
[11] R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes. A case for grid computing on virtual machines. In ICDCS ’03: Proceedings of the 23rd International Conference on Distributed Computing Systems, page 550, Washington, DC, USA, 2003. IEEE Computer Society.
[12] I. Foster, T. Freeman, K. Keahy, D. Scheftner, B. Sotomayer, and X. Zhang. Virtual clusters for grid communities. In CCGRID ’06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06), pages 513–520, Washington, DC, USA, 2006. IEEE Computer Society. [13] Y. Fu, J. Chase, B. Chun, S. Schwab, and A. Vahdat. SHARP: An
Architecture for Secure Resource Peering. In Proceedings of the 19th ACM Symposium on Operating System Principles, October 2003.
[14] L. Grit, D. Irwin, A. Yumerefendi, and J. Chase. Virtual Ma-chine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration. In Proceedings of the First Inter-national Workshop on Virtualization Technology in Distributed Computing (VTDC), November 2006.
[15] C. Guok, D. Robertson, M. Thompson, J. Lee, B. Tierney, and W. Johnston. Intra and Interdomain Circuit Provisioning Using the OSCARS Reservation System. In Proc. GridNets, 2006. [16] J. Ham, F. Dijkstra, P. Grosso, R. Pol, A. Toonk, and C. Laat.
A distributed topology information system for optical networks based on the semantic web. Journal of Optical Switching and Networking, 5(2-3), June 2008.
[17] D. Irwin, J. S. Chase, L. Grit, A. Yumerefendi, D. Becker, and K. G. Yocum. Sharing Networked Resources with Brokered Leases. In Proceedings of the USENIX Technical Conference, June 2006.
[18] X. Jiang and D. Xu. Violin: Virtual Internetworking on Overlay Infrastructure. In Proceedings of the Third International Sym-posium on Parallel and Distributed Processing and Applications (ISPA), July 2003.
[19] K. Keahey and T. Freeman. Science Clouds: Early Experiences in Cloud Computing for Scientific Applications. In Cloud Com-puting and its Applications (CCA), 2008.
[20] D. I. S. O. Kou Kikuta, Masahiro Nishida and N. Yamanaka. Es-tablishment of vlan tag swapped path on gmpls controllingwide area layer-2 network. In Proc. of IEEE OFC, 2009.
[21] I. Krsul, A. Ganguly, J. Zhang, J. A. B. Fortes, and R. J. Figueiredo. Vmplants: Providing and managing virtual machine execution environments for grid computing. In SC ’04: Pro-ceedings of the 2004 ACM/IEEE conference on Supercomputing, page 7, Washington, DC, USA, 2004. IEEE Computer Society. [22] L. Ramakrishnan, L. Grit, A. Iamnitchi, D. Irwin,
A. Yumerefendi, and J. Chase. Toward a Doctrine of Con-tainment: Grid Hosting with Adaptive Resource Control. In Supercomputing (SC06), November 2006.
[23] P. Ruth, X. Jiang, D. Xu, and S. Goasguen. Virtual distributed environments in a shared infrastructure. Computer, 38(5):63–69, 2005.
[24] N. SHERPA. http://noc.nlr.net/nlr/maps_ documentation/nlr-framenet-documentation. html.
[25] A. SUNDARARAJ and P. DINDA. Towards virtual networks for virtual machine grid computing, 2004.