Achieving high reliability at cloud scale
6.1 SOA as a precursor to the cloud
Distributed, loosely coupled systems, which formed the basis for SOA, are by now widely used by virtually every organization with an active web presence. They formed the direct precursor to cloud computing. This architecture also presents one of the best approaches to reliable (or at least fault-tolerant) systems. Let’s begin this section by examining distributed systems and loose coupling before delving into SOA more deeply and seeing how it has informed reliable cloud computing approaches.
6.1.1 Distributed systems
The most important point about distributed systems, and why they can be more reli- able than nondistributed systems, is that when properly implemented, they have no single point of failure . Distributed web architectures typically fall into one of several basic categories:
■ Client-server architectures are two-tier. Smart client code contacts the server for
data and then formats and displays it to the user. Input at the client is commit- ted back to the server when it represents a permanent change. The server is frequently little more than a database.
■ Three-tier architectures add a business- logic middle tier. Three-tier systems
(see figure 6.1) move the client in- telligence (also called business logic ) to a middle tier so that stateless cli- ents can be used. This simplifies application deployment. Most web applications are three-tier.
■ N-tier architectures usually refer to web applications that utilize more services. N-tier (see figure 6.2) typi-
cally refers to web applications that further forward their requests to other enterprise services. This type
Presentation tier
HTML, CSS, JavaScript over HTTP
Data tier
JDBC to SQL
Business tier
Figure 6.1 A three-tier architecture: Presentation layer + Business layer + Database layer
of application is the one most responsible for the success of application server s.
■ Tightly coupled (clustered) architecture s are
a form of parallel processing . This refers
typically to a cluster of machines that work closely together, running a shared process in parallel. The task is subdi- vided in parts made individually by each one and then put back together to make the final result.
■ Peer-to-peer is clientless and has no single
point of failure that can cause total fail- ure. This type of architecture has no
special machine or machines that pro- vide a service or manage the network re- sources. Instead, all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and servers.
This book focuses on the multitier architec- ture s (three-tier and N-tier) because they ap- ply best to the web and to the cloud. This is because the browser is the definition of a thin client presentation layer where the work has to be done on an application server on its behalf. A SOA falls into the same category. The next
section will drill down into the SOA style of distributed application.
6.1.2 Loose coupling
In computer science, coupling refers to the degree of direct knowledge that one com- ponent has of another. It’s the degree to which components depend on one another. What does this have to do with reliability or the cloud? Loose coupling affects reli- ability because each component that operates somewhat independently from all other objects can be built, tested, and replaced separately from all other objects. It’s easier to build the other components such that they can handle when this component fails, either by failing gracefully themselves or by accessing another instance of this com- ponent running somewhere else. Earlier, you learned about humans interacting with websites through a browser, and one machine at one site interacting with another machine at another site. Loose coupling is the only application architecture that can provide reliable web applications, because one site never knows when another may be out, slow, or have made an unannounced change in its interface.
Figure 6.2 An N-tier architecture. Many variations are possible. But generally, an application server is involved. From the application server, many different logical layers can be accessed. At the application server level, you can begin to interact with the cloud. Any or all of these layers can operate in the cloud effectively.
CSS HTML
Frameworks API Web services Web server
Application server
Database File servers Communication Data services
Integration services
As some have pointed out, the ultimate way to make two components loosely coupled is to not connect them at all; short of that, make sure the communications between components don’t depend on internals of the component
and only access an abstract interface layer.
At the class level, strong coupling occurs when a dependent class contains a pointer directly to a concrete class that provides the required behavior. This is shown abstractly in figure 6.3. Loose coupling occurs when the dependent class contains a pointer only to an interface, which can then be implemented by one or many concrete classes. Loose coupling provides extensibility to designs (see figure 6.4). You can later add a new concrete class that implements the same interface without ever having to modify and recompile the dependent class. Strong coupling doesn’t allow this.
Tight coupling leads to a situation where a change in one module forces a ripple effect of changes in other modules. Further, assembly of modules requires more effort and time due to the increased intermodule dependencies . One module may be harder to reuse because dependent modules must be included with it.
Loosely coupled systems benefit from the negation of each of these characteristics. Tight versus loose coupling is an important concept for application reliability, SOAs, and ultimately reliable cloud applications. Table 6.1 lists a series of important characteristics of applications that can be measured against an application tightly
Figure 6.3 Strong coupling. Changes in A impact B, C, and D. Changes in B impact A, C, and D. Strong Coupling A B D C Loose Coupling A B C D
Figure 6.4 Loose coupling. Modifications in A’s behavior don’t impact B, C, or D. Modifications in B’s behavior
Table 6.1 Critical application attributes in tightly vs. loosely coupled architectures
Tightly coupled Loosely coupled
Technology mix Homogeneous Heterogeneous
Data typing Dependent Independent
Interface model API Service
Interaction style RPC Document
Synchronization Synchronous Asynchronous
Granularity Object Message
Syntactic definition By convention Self-describing Semantic adaptation By recoding Via transformation
Bindings Fixed and early Delayed
Software objective Reusability Broad applicability
Consequences Anticipated Unintended
versus loosely coupled. This table is based on ideas originally expressed by Doug Kaye on his blog called Loosely Coupled (www.looselycoupled.com/blog/).
You can use a list of techniques to help create and maintain loose coupling in your application components. To achieve the desired loose coupling, use the following:
■ Vendor- and platform-independent messages ■ Stateless messaging where possible and appropriate
■ Coarse-grained, self-describing, and self-contained messages ■ Constrained, well-defined, extensible, and versionable interfaces ■ Human-readable strings (URIs) for service and instance addresses ■
Asynchronous exchange patterns where possible and appropriate Humans controlling clients where possible and appropriate
■
Web applications followed many of the attributes of loose coupling. When we moved toward machine-to-machine communication over the web, we retained the loose cou- pling and created the concept of SOA. Here, a remote service publishes its interface (via a WSDL ), and a consuming service has to abide by that interface to consume the service. SOA was an important evolutionary step to get to the cloud. Let’s look much more closely at how SOA works.
6.1.3 SOA
Computing has several different definitions of SOA. SOA is an attempt to provide a set of principles or governing concepts used during the phases of systems development and integration. It attempts to package functionality as interoperable services in the context of the various business domains that use it. Several departments in a company or different organizations may integrate or use such services—software modules pro- vided as a service—even if their respective client systems are substantially different.
SOA is an attempt to develop yet another means for software module integration toward a distributed application . Rather than defining an API, SOA defines the interface in terms of protocols and functionality. An endpoint is the entry point for such an SOA implementation.
SOA A flexible set of design principles used during the phases of systems development and integration. A deployed SOA-based architecture provides a loosely coupled suite of services that can be used in multiple business domains. SOA separates functions into distinct units, or services , which developers make accessible over a network (usually the internet) in order to allow users to combine and reuse them in the production of applications. These services, and their corresponding consumers, communicate with each other by passing data in a well-defined, shared format (usually XML ), or by coordinating an activity between two or more services.
SOA is about breaking an architecture down to its functional primitives, understand- ing its information and behaviors, and building it up again using service interfaces abstracted into a configuration layer to create business solutions. SOA naturally fits the definition of loose coupling because it treats services as black boxes of functionality with a simple internet standards-based interface between these service components.
6.1.4 SOA and loose coupling
SOA in its simplest form was aimed at allowing one computer to access a capability across the internet on another computer that previously might have been accessed by a hu- man through a browser. For example, an early web service allowed a site selling domain names to also start selling digital certificates, where the authentication of the certificate buyer was performed at a third-party site. (Previously, you would have gone to that third- party site and, using the browser, followed the authentication process, thus breaking the
stickiness of the original vendor—which lost a buyer in the middle of a transaction.)
SOA enabled a form of aggregation where a web application could be constructed out of services, some of which were yours and some of which were delivered by others. In this way, SOA aims to allow users to string together fairly large chunks of functionality to form ad hoc applications built almost entirely from existing software services. The larger the chunks, the fewer the interface points required to implement any given set of functionality. But large chunks of functionality may not prove sufficiently granular for easy reuse. Each interface brings with it some amount of processing overhead. You must consider performance in choosing the granularity of services. The great promise of SOA suggests that the marginal cost of creating the nth application is low, because all the software required already exists to satisfy the requirements of other applications. Ideally, you require only orchestration to produce a new application.
For this to work well, no interactions must exist between the chunks specified or within the chunks themselves. Instead, you have to specify the interaction of service s (all
of them unassociated peers) in a relatively ad hoc way with the intent driven by newly emergent requirements. This is why services must be much larger units of functionality than traditional functions or classes, lest the sheer complexity of thousands of such granular objects overwhelm the application designer. Programmers develop the service s themselves using traditional languages such as Java, C, and C++.
SOA services feature loose coupling, in contrast to the functions that a linker binds together to form an executable to a dynamically linked library, or to an assembly. SOA services also run in safe wrappers (such as Java or .NET) and in other programming languages that manage memory allocation and reclamation, allow ad hoc and late binding, and provide some degree of indeterminate data typing.
6.1.5 SOA and web services
Web services can implement a SOA. Web services make functional building blocks ac- cessible over standard internet protocols (such as HTTP) independent of platforms and programming languages. These services can represent either new applications or wrappers around existing legacy systems to make them network-enabled.
Each SOA building block can play one or both of two roles: service provider or service consumer.
SERVICE PROVIDER
A service provider creates a web service and possibly publishes its interface and access information to a service registry. Each provider must decide which services to expose, how to make trade-offs between security and easy availability, and how to price the services or (if no charges apply) exploit them for other value. The provider also has to decide what category the service should be listed in for a given broker service and what sort of trading partner agreements are required to use the service. It registers what services are available within it and lists all the potential service recipients.
The implementer of the broker then decides the scope of the broker. You can find public brokers through the internet, whereas private brokers are only accessible to a limited audience—for example, users of a company intranet. Furthermore, you must decide on the amount of offered information. Some brokers specialize in many listings. Others offer high levels of trust in the listed services. Some cover a broad landscape of services, and others focus within an industry. Some brokers catalog other brokers. Depending on the business model, brokers can attempt to maximize look-up requests, number of listings, or accuracy of the listings.
The Universal Description Discovery and Integration (UDDI ) specification defines a way to publish and discover information about web services. Other service broker technologies include (for example) Electronic Business using eXtensible Markup Language (ebXML) .
SERVICE CONSUMER
The service consumer or web service client locates entries in the broker registry using various find operations and then binds to the service provider in order to invoke one of its web services. Whichever service the service consumers need, they have to take it
into the brokers, bind it with respective service, and then use it. They can access mul- tiple services if the service provides multiple services.
Note that Amazon’s cloud services are called Amazon Web Services, and Amazon is a web service provider in the way described here.
6.1.6 SOA and cloud computing
SOA and cloud computing can be paired to gain the benefits both of service deploy- ments and of the scale and economics of the cloud. With cloud computing, enterprises can access services hosted on third-party servers over the internet. With SOA, enter- prises use integrated application services in a more lightweight fashion than tradi- tional application platforms.
Because cloud computing is a way of creating a system in which some or all of its IT resources exist within a third-party cloud computing resource, such as Amazon EC2 or Force.com, cloud computing can involve part or all of an architecture. The core difference is that the system is extended to resources that you don’t own or host locally.
Putting this more simplistically, SOA is all about the process of defining an IT solution or architecture, whereas cloud computing is an architectural alternative. We can say that SOA can’t be replaced by cloud computing. Most cloud computing solutions are defined through SOA. They don’t compete—they’re complementary notions.
Adopting SOA can prepare an enterprise for cloud computing by showing what challenges the organization faces internally in supporting service components— challenges that using cloud services will exacerbate. The service orientation in SOA and the cloud make for similarities, such as both concepts requiring a governance layer and a strong understanding of processes.
Both the cloud and SOA determine what some of the major reusable components are and what the right technologies to run large-scale components over open networks are. An organization that has moved toward SOA in a modular fashion is in a better position to move modules to the cloud.
Further, the cloud serves as a good way to deploy services in an SOA environment. SOA and the cloud support each other but aren’t based on the same ideas. Cloud computing is a deployment architecture , not an architectural approach for how to architect your enterprise IT, whereas SOA is.
Components that reside on different computers (some or all of which are in the cloud) and must communicate over the network—potentially over the public internet— require communication between those components (or processes). It’s important that your understanding of interprocess communication is current. The next section delves into a typical type of interprocess communication used in the cloud.
6.1.7 Cloud-based interprocess communication
Amazon Simple Queue Service (SQS) is a way of sending messages between applica- tions (components in a distributed application) via web services over the internet. The intent of SQS is to provide a highly scalable and hosted message queue .
SQS works in a complementary fashion with EC2. (See figure 6.5.) It’s a highly reli- able, scalable message queuing service that enables asynchronous message -based com- munication between distributed components of an application. Those components are typically EC2 instances. You can send any number of messages to an Amazon SQS queue at any time from any component. The messages can be retrieved from the same component or a different one right away or at a later time. No message is ever lost in the interim; each message is persistent ly stored in highly available, highly reliable queues. Multiple processes can read, write from, and write to an Amazon SQS queue at the same time without interfering with each other.
Now that you’ve delved into loose coupling, its incarnation in distributed applications, and the way in which those distributed application components may communicate across the internet, let’s look at a framework called MapReduce that handles much of this infrastructure for you and yet allows scaling to a massive level, all the while giving you a simple way to maintain high reliability.