Security and the private cloud
4.2 Rationale for a private cloud
It may be useful to step back and recall what exactly a cloud is. This is necessary in the context of the discussion of private clouds because you can lose clarity in understanding: IT vendors, knowing how attractive the moniker is, often try to cast any and every solu- tion as a cloud solution. Many vendors with products and services that involve virtualiza- tion, storage, or data-center automation claim that they have a private cloud offering. 4.2.1 Defining a private cloud
A private cloud comprises a subset of the five main principles we’ve been using as our definition of cloud computing:
■ Pooled resources —Available to any subscribing users ■ Virtualization —High utilization of assets
■ Elasticity —Dynamic scale without CAPEX
■ Automation —Building, deploying, configuring, provisioning, and moving, all without manual intervention
■ Metered billing —Per-usage business model: pay for what you use
For private clouds, the three principles associated with the technical attributes still hold true—virtualization, elasticity, and automation—and translate directly from public clouds to private clouds. The other two—pooled resources and metered billing—relate more specifically to the business attributes of public clouds and are less directly applicable to the private cloud. Private clouds by definition aren’t a pool of computing resources ac- cessible to subscribing users on demand. Not just anyone with a credit card can access the resources in a private cloud; these resources are reserved for the exclusive use of the organization which owns the resources. They may implement metered billing in certain cases for private clouds within large organizations, but not necessarily.
PRIVATE CLOUD (OR INTERNAL CLOUD OR CORPORATE CLOUD) A computing architecture that provides hosted services to a specific group of people behind a firewall. A private cloud uses virtualization, automation, and distributed computing to provide on-demand elastic computing capacity to internal users. As you saw earlier in chapter 3, public cloud providers are spending a great deal of money in new data centers to power their cloud initiatives, with Google investing approximately $2.3 billion in 2008 for its buildout.3 At first glance, in the face of this massive investment, it may seem like a foolhardy proposition to attempt to go it alone and build your own private cloud. But remember that large IT organizations have a long history of providing data-center services, many with longer track records of doing so than most if not all of today’s incumbent cloud providers (Amazon, Google, and Microsoft). They have tremendous amounts of resources and past investment in hard- ware and data center assets. They can certainly put these to good use.
3 Rich Miller, “Facebook: $20 Million a Year on Data Centers,” www.datacenterknowledge.com/
Public cloud spending in perspective
Before you get too carried away with the thought that public cloud providers have economies of scale beyond the reach of anyone else because of the amount they spend on hardware and data centers, you should ground yourself with a few facts. Let’s assume that Google, Amazon, and Microsoft each spend $10 billion per year next year on cloud infrastructure. That amount is only a drop in the bucket compared to the total amount of money spent annually on IT worldwide. The financial services industry alone will spend over 50 times that much on IT in 2010 (~$500 billion).
Over the last decade, many large enterprises have launched virtualization projects and initiatives and have reaped the benefit of increased resource utilization and efficien- cies. Those that have done so are one step closer toward having a private cloud. As described earlier, this is one of the three key technology principles for private cloud computing. The only incremental changes needed to have a private cloud are the ad- dition of elasticity and cloud-automation technologies.
Four primary considerations—security, availability, user community (size), and economies of scale—drive the choice of a private cloud as a deployment strategy. You can see this in table 4.1.
The security and availability constraints of target applications and data, and the degree to which they must be under direct control, may dictate whether a public cloud option is viable or whether you should consider a private cloud solution. For a private cloud deployment to make sense, your company’s size and needs should be sufficiently large to have economies of scale when purchasing capital equipment.
Table 4.1 The four primary private cloud considerations
Consideration Rationale
Security Applications that require direct control and custody over data for security or privacy reasons
Availability Applications that require certain access to a defined set of computing resources that can’t be guaranteed in a shared resource pool environment
User community Organization with a large number of users, perhaps geographically distributed, who need access to utility computing resources Economies of scale Existing data center and hardware resources that can be used, and
the ability to purchase capital equipment at favorable pricing levels
If your organization’s security and availability requirements are high at the same time, the scope of the user base to be supported and the purchasing power of your organiza- tion must be sufficiently strong for a private cloud to be a good option.
4.2.2 Security considerations
Although as you saw earlier in this chapter, security within the public cloud can often be comparable or superior to data security in a corporate data center, public cloud
computing isn’t always an option. For example, many government organizations have applications that deal with confidential or classified data that under no circumstances may be put at risk, such as those dealing with national security . Other applications in other industries have regulatory requirements that make them think twice before deploying to a public cloud. The main distinction that makes private clouds more se- cure and more appropriate for compliance concerns is simply this: they can physically and logically segregate resources more thoroughly and thereby remove more doubts among users that their data is safe and secure.
Public cloud providers are aware that security is a main blocking or gating factor for many enterprises and have devoted significant resources to designing and proving their ability to deal with secure data. As mentioned previously, Amazon has achieved SAS 70 Type II certification for AWS, which ensures that it has the appropriate processes and infrastructure in place to handle data securely and with high availability for customers. Amazon has also made claims that its infrastructure has been designed such that it can support the requirements of regulatory frameworks, such as HIPAA. HIPAA spells out the measures that organizations in the healthcare industry must adhere to in order to ensure the privacy of their patient’s data. Having the hooks to enable HIPAA compliance and implementing a HIPAA-compliant application are two different things. Providers must develop best practices and gain experience in supporting HIPAA-compliant applications in the public cloud before most enterprises will be comfortable with this mode of deployment.
4.2.3 Certainty of resource availability
Although you may think of the cloud as an infinite resource pool from which resources can be drawn, this isn’t always the case. For example, consider an application that requires a huge number of resources for massive processing in a short time window. As of late 2009, Amazon has advised its users that it can’t guarantee the availability of 500 XL instance s (where XL instances are high-compute resources with 8 CPU virtu- als) at any given time from a specific availability zone. For cases requiring resources in excess of 1,000 XL instances, Amazon requests a week’s prior notice to improve the chances of the resources being available.
Resource constraints are a much more serious matter in smaller cloud providers. Rackspace , also in late 2009, imposed a limit of 50 virtual instances being run in its environment per day for any given user. Overall total capacity in these systems should improve going forward; but even so, there is still the caveat related to variations in demand caused by overlapping requirements from multiple different public cloud customers. By comparison, electric utilities, which have been running for more than a century, still have capacity issues in the heat of the summer when demand for electricity to power air-conditioning can cause brownouts due to a mismatch of available supply and demand. You can easily imagine the same thing happening in the cloud context if all e-commerce sites were public cloud consumers and witnessed 10X traffic spikes on Black Friday as the shopping season began or if there were another terrorist incident . At some point, cloud pricing will take into account this type of variability in demand, and providers will introduce variable pricing.
4.2.4 Large utility-computing community
If you have a relatively small requirement for utility computing resources, having a good virtualized infrastructure can probably suffice. But if your organization has many constituents that can take advantage of a generalized infrastructure for their needs, then the added complexity and sophistication of a cloud infrastructure may make sense. By implementing a private cloud, you’ll introduce the concept of multitenancy and, hence, the ability to segment and isolate individual groups and users.
4.2.5 Economies of scale
A public cloud provider has two potential primary advantages from an economic perspective over a company interested in running its own private cloud. The first relates to the physical resources required to run a cloud. In chapter 2, you saw how the public cloud providers’ buying power is harnessed to purchase large quantities of hardware for servers and build data centers with good network connectivity and low-cost power. In chapter 3, you saw how that translated in terms of a business case for deploying applications. These arguments are based on the ability to purchase servers and hosting resources at small scale. For a large multinational or govern- ment, the economics may be much different given the long-term relationships and volumes that are purchased annually from their sources. On top of this, consider that these organizations may already have large quantities of hardware and pipe available. In addition, if a company is already in the midst of executing a virtualiza- tion strategy, the existing investments may be well positioned to be converted into a cloud.
The second aspect relates to the expertise required to run and maintain a cloud infrastructure. The public cloud providers, as they’ve been designing for scale, have been creating infrastructure where one of the primary objectives is the reduction of the number of resources required to operate a data center. In most cases, conventional IT organizations require more engineers and technicians to run a smaller data center. By migrating to a cloud-style deployment, they may save money over their existing deployment. But this may require a retooling of their current resources or hiring a smaller number of more skilled resources.
4.2.6 Some concerns about deploying a private cloud
Before you or anyone jumps to deploying a private cloud, let’s assess a quick set of four major concerns.
PRIVATE CLOUDS ARE SMALL SCALE
Why do most innovative cloud-computing providers have their roots in powering con- sumer web technology? Because that’s where the big numbers of users are. Few corpo- rate data centers see anything close to the type of volume seen by these vendors. And, as you’ve seen, volume drives costs down through the huge economies of scale.
LEGACY APPLICATION DON’T CLOUDIFY EASILY
You can achieve only so much without rearchitecting these applications to a cloud infrastructure.
ON-PREMISES DOESN’T NECESSARILY MEAN MORE SECURE
The biggest drivers toward private clouds have been fear, uncertainty, and doubt about security. For many, it feels more secure to have your data behind your firewall in a data center that you control. But unless your company spends more money and energy thinking about security than Amazon, Google, and Salesforce, that isn’t true.
DO WHAT YOU DO BEST
Do you think there’s a simple set of tricks that an operator of a data center can borrow from Amazon or Google? No way. These companies make their living op- erating the world’s largest data centers. They’re constantly optimizing how they operate based on real-time performance feedback from millions of transactions. Although you can try to learn from and emulate them (hard to do because they protect their trade secrets as if national security depended on it!), your rate of in- novation will never be the same—private clouds will always be many steps behind the public clouds.
4.2.7 Private cloud deployment options
If, despite these concerns, you plan to proceed down the private cloud path, you have several options available for building your private cloud. As discussed earli- er, for companies and organizations that can acquire and provision hardware and data-center resources efficiently enough, a private cloud may make sense. In addi- tion to the capital costs for hardware, an organization needs to determine its strat- egy with respect to the software infrastructure it’ll use to operate and manage the cloud. The costs involved vary substantially and can range from free if you adopt an open source approach to over $1 million for a full-service offering that in- cludes proprietary software and architecture, design, and implementation services. Table 4.2 summarizes the possible private cloud implementation categories and ex- ample vendors/solutions.
Table 4.2 Private cloud deployment options by type
Provider type Example vendors Description
Open source Eucalyptus , OpenNebula Free software for creating a private cloud implementation, primarily on UNIX -based systems Proprietary
software
VMware , Enomaly , Appistry Proprietary private cloud solutions open with a specific strength in a core cloud technology, such as virtualization, storage, or management Hosted offering Savvis , OpSource , SunGard Dedicated hardware hosted in a cloud model for
a single customer, built using either open source or a proprietary solution
System integrator
Appirio , Accenture , Infosys Specialty providers or practice areas in large firms dedicated to architecture, design, and deployment of private clouds
DO-IT-YOURSELF PRIVATE CLOUDS/OPEN SOURCE
The public cloud providers have primarily implemented their solutions with a combi- nation of open source and homegrown software. Their user-facing APIs are publicly visible, but they haven’t released the core technologies for operating and managing their clouds. Eucalyptus and OpenNebula are two open source initiatives, both off- shoots of university research projects , that have created software to replicate the home- grown software of the large public cloud providers. They provide a software capability for provisioning and managing a multiuser private cloud built on top of commod- ity hardware. They’ve also made their solutions compatible with the APIs provided by Amazon.
Using these software solutions allows you to create an interoperable infrastructure that can work as a hybrid cloud . They’re open source initiatives and, unlike proprietary approaches, there’s no specific incentive (such as having you buy more of their software) to create lock-in; and you have unlimited flexibility as usual and with the same regular caveats around support and usability. Constructing a private cloud using open-source technologies requires a high degree of technical sophistication and probably works best in organizations that have a history of working with open source on other projects.
PROPRIETARY SOFTWARE CLOUD SOLUTIONS
Several vendors offer commercial packages to enable private clouds. Best-of-breed startups, such as Appistry , focus specifically on this problem. Like the open-source solutions described previously, they’re designed to enable the creation of a private cloud on multiple commodity hardware resources. Some providers, such as ParaS- cale , focus specifically on the aspects of cloud computing related to storage. Large IT vendors, such as EMC , Oracle , IBM , and Unisys , are positioning themselves as being able to provide an entire private cloud stack, including hardware systems, virtualiza- tion technology, and software applications for operating and managing the private cloud. These systems can be as small as a handful of rack-mounted appliances or as large as data centers filled with thousands of servers housed in modular container pods. Additionally, these providers offer provide consulting services for the architec- ture, design, and implementation of clouds.
PRIVATIZATION OF PUBLIC CLOUDS
Public cloud service providers also provide services that are moving closer to the con- cept of private clouds. The Amazon Virtual Private Cloud (VPC ) offering allows you to connect resources in the public cloud from within its firewall via an IPSec VPN. This isn’t the same as a private cloud—merely the ability to connect and communicate securely with public cloud resources. You’ll read much more about the VPC concept in section 4.3.
The next logical step for hosting and public cloud providers is to deliver dedicated private cloud services. In this model, a service provider reserves or dedicates a specific portion of the cloud infrastructure to a specific customer. Some example providers of dedicated private cloud hosting include Savvis , which provides cloud services as a horizontal solution; and SunGard , which provides dedicated cloud services for
financial services customers. Dedicated cloud services are much more costly than public cloud services; the difference in cost is similar to the difference in cost between shared commodity hosting, which can be had for under $100/year, and traditional dedicated hosting, which can cost as much as $1,000/month per server.
Up to this point in the chapter, you’ve read about the merits and drawbacks of pursuing a private cloud strategy and have looked at some of the options for creating a private cloud. Now, let’s switch gears and look practically at how to build a private cloud system using open source. It turns out that putting together a private cloud, at least a small-scale proof-of-concept system, is straightforward. Building such a system can help you understand the software components that make up a cloud. Because the open-source private cloud systems have been designed for interoperability with Amazon EC2, they can also provide a playpen environment for experimenting with a hybrid cloud (part private, part public).
IMPLEMENTING AN OPEN-SOURCE PRIVATE CLOUD
It’s becoming increasingly easy to put together a private cloud using open source as major Linux distributions start bundling cloud software in their standard packages. Ubuntu 9.10 Server, for example, has an option to deploy a configuration called Ubuntu Enterprise Cloud (UEC ). On a clean install of the OS, UEC is provided as an option. When you choose this option, Eucalyptus cloud software is installed on the system.
The Eucalyptus system consists of several software components that run the private cloud. The first component is called the Node Controller (NC ); it resides on each computer that consists of the pool of available resources for creating virtual instances. The NC is responsible for managing the virtual instances started and stopped on an individual computer. One or more computers with NCs on them constitute a cluster, which is managed by another process called the Cluster Controller (CC ). The CC is responsible for managing the NCs in its cluster and farms out work orders it receives to the NCs to start and stop virtual instances. A single Cloud Controller (CLC ) manages