Overview
At the core of Web services is Simple Object Access Protocol (SOAP), which provides a standard way of packaging messages. SOAP has received a lot of attention because it facilitates RPC-style communication between a client and a remote server. But plenty of protocols have been created to facilitate communication between two applications—
including Sun’s RPC, Microsoft’s DCE, JAVA’s RMI, and CORBA’s ORPC. So why is SOAP getting so much attention?
One of the primary reasons is that SOAP has incredible industry support. SOAP is the first protocol of its kind to be accepted by practically every major software company in the world. Companies that rarely cooperate with each other are rallying around this protocol. Some of the major companies that are supporting SOAP include Microsoft, IBM, Sun Microsystems, SAP, and Ariba.
Here are some of the advantages of SOAP:
§ It is not tightly coupled to one language. Developers involved with new projects can choose to develop in today’s latest and greatest programming language. But developers who are responsible for maintaining legacy applications might not have a choice about the programming language they use. SOAP does not specify an API, so the implementation of the API is left up to the programming language (such as Java) and the platform (such as Microsoft .NET).
§ It is not tightly coupled to a particular transport protocol. The SOAP specification does describe how SOAP messages should be bound to HTTP. But a SOAP message is nothing more than an XML document, so it can be transported over any protocol that is capable of transmitting text.
§ It is not tied to any one distributed object infrastructure. Most distributed object systems can be extended (and some of them are) to support SOAP. It is important to realize that even with SOAP, middleware such as COM+ still plays an important role in the enterprise. Component middleware is still responsible for some of the more
advanced object management features such as object lifetime management, transactions, object pooling, and resource pooling. SOAP enables a degree of
interoperability between different systems that are running component middleware from competing vendors.
§ It leverages existing industry standards. The primary contributors to the SOAP specification intentionally avoided reinventing anything. They opted to extend existing standards to meet their needs. For example, SOAP leverages XML for encoding messages. Instead of using its own type system, SOAP leverages the type definitions already defined within the XML Schema specification. And as I have mentioned, SOAP does not define a means of transporting the message; SOAP messages can be bound to existing transport protocols such as HTTP and SMTP.
§ It enables interoperability across multiple environments. SOAP was built on top of existing industry standards, so applications running on platforms that support these standards can effectively communicate via SOAP messages with applications running on other platforms. For example, a desktop application running on a PC can effectively communicate with a back-end application running on a mainframe that is capable of sending and receiving XML over HTTP.
This chapter covers the following key aspects of the SOAP specification:
§ The SOAP envelope. This is used to encode header information about the message and the body of the message itself.
§ SOAP Encoding. This is a standard way of serializing data into the body of a SOAP message.
§ RPC-style messages. I discuss the protocol you can use to facilitate procedure- oriented communication via request/response message patterns.
§ The HTTP POST protocol binding. This is the standard method of binding SOAP messages to HTTP.
Before I go any further, I want to discuss the status of SOAP. This chapter was written against version 1.1 of the SOAP specification (http://www.w3.org/TR/SOAP). The World Wide Web Consortium (W3C) is continuing to develop SOAP. On July 9, 2001, a working draft of SOAP 1.2 was published (http://www.w3.org/TR/2001/WD-soap12-20010709) by the XML Protocol Working Group.
As an acknowledgment of the phenomenal industry support that SOAP enjoys, the XML Protocol Working Group is committed to maintaining a smooth migration path from SOAP 1.1 to SOAP 1.2. Many of the proposed modifications are fit-and-finish and do not radically alter the use of SOAP. Much of what you have learned about SOAP 1.1 will directly translate to SOAP 1.2.
In addition, the majority of the Microsoft products that incorporate SOAP will likely not adopt SOAP 1.2 until it becomes an official W3C recommendation. Therefore, I recommend that you focus on learning the SOAP 1.1 protocol with an eye on the deltas in version 1.2.
Anatomy of a SOAP Message
SOAP provides a standard way of packaging a message. A SOAP message is composed of an envelope that contains the body of the message and any header information used to describe the message. Here is an example:
The root element of the document is the Envelope element. The example contains two subelements, the Body and Header elements. A valid SOAP message can also contain other child elements within the envelope. You will see examples of this when I discuss serializing references using SOAP Encoding.
The envelope can contain an optional Header element, which contains information about the message. In the preceding example, the header contains two elements describing the individual who composed the message and the intended recipient of the message. (I describe the SOAP header in more detail later in the chapter.)
The envelope must contain one Body element. The body contains the message payload. In my example, the body contains a simple character string.
Notice that each SOAP-specific element has the soap namespace prefix. This prefix is defined within the Envelope element and points to the SOAP schema that describes the structure of a SOAP message. The prefix is appended to any elements defined within the
SOAP namespace. These elements are fully qualified. The soap prefix indicates that the
Envelope element is an instance of the SOAP Envelope type. I will drill deeper into XML namespaces in the next chapter.
SOAP Actors
Before I describe the individual parts of a SOAP message, I want to define a couple of terms I will be using. A SOAP actor is anything that acts on the content of the SOAP message. There are two types of SOAP actors, default actors and intermediaries.
The default actor is the intended final recipient of a SOAP message. An intermediary receives a SOAP message and might act on the message (including modifying it in some way) before forwarding it along the intended message path, as shown in the following diagram. Even though intermediaries might modify the data transferred from the client to the default actor, it is still considered the same message.
The Header Element
The optional Header element is used to pass data that might not be appropriate to encode in the body. For example, if the default actor receives a message in which the body is
compressed, the default actor would need to know what type of compression algorithm was used in order to uncompress the message. Embedding information about the compression algorithm into the body does not make sense because the body itself will be compressed. Placing this type of information in the header of the message is more appropriate.
Other uses for the header include the following:
§ Authentication. The recipient might require the sender to authenticate himself before the message can be processed.
§ Security digest information. If the recipient needs assurance that the contents of the message have not been tampered with, the sender can digitally sign the message body and place the resulting digest into the header.
§ Routing information. If the message needs to be routed to many destinations, the destinations and their order can be included in the header.
§ Transactions. The recipient might have to perform some action in the scope of the sender’s transaction.
§ Payment information. If the recipient of the message provides services to the client based on a per-usage fee, information necessary for collecting payment can be embedded in the header.
The Header element can be added as an immediate child element within the SOAP
Envelope. The header entries appear as child nodes within the SOAP Header element. Here is an example: <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Header> <Digest>B839D234A3F87</Digest> </soap:Header>
<soap:Body> <StockReport> <Symbol>MSFT</Symbol> <Price>74.56</Price> </StockReport> </soap:Body> </soap:Envelope>
The SOAP message contains a Digest element in the header that the remote application can use to ensure that the message has not been tampered with. If the client is doing a routine check to see what her stock closed at, she might not be concerned about validating the message. But if the price of the stock triggers an event within the financial software package, she might be more interested in validating the message. For example, it would be
unfortunate if the financial software package were to automatically liquidate her portfolio as the result of receiving a bogus message sent by some 14-year-old kid.
mustUnderstand Attribute
Because headers are optional, the recipient of the message can choose to ignore them. However, some information that can be embedded in the header should not be ignored by the intended recipient. If the header is not understood or cannot be handled properly, the application might not function properly. Therefore, you need a way to distinguish between header information that is informative and header information that is critical.
You can specify whether the message recipient must understand an element in the header by specifying the mustUnderstand attribute with a value of 1 in the root of the header element. For example, the SOAP message might request that a remote application perform an action on the client’s behalf. The following example updates a user’s account information within the scope of a transaction:
<?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Header> <TransactionId soap:mustUnderstand="1">123</TransactionId> </soap:Header> <soap:Body> <UpdateAccountInfo> <email>[email protected]</email> <firstName>Scott</firstName> <lastName>Short</lastName> </UpdateAccountInfo> </soap:Body> </soap:Envelope>
The recipient of the message must update the user’s account information within the scope of the client’s transaction. If the transaction is aborted, the remote application must roll back the requested changes to the user’s account information. Therefore, I encoded the transaction
ID within the header and set the mustUnderstand attribute to 1. The remote application must either honor the transaction or not process the message.
actor Attribute
A SOAP message can be routed through many intermediaries before it reaches its final destination. For example, the previous document might be rout ed through an intermediary responsible for creating a transaction context. In this case, you might want to clearly specify that the TransactionId header is intended to be processed by the transaction intermediary rather than by the default actor.
The SOAP specification provides the actor attribute for annotating SOAP headers intended for certain intermediaries. The value of this attribute is the Uniform Resource Identifier (URI) of the intermediary for which the portion of the message is intended. If a header is intended to be processed by the next intermediary to receive the SOAP message, the actor attribute can be set to http://schemas.xmlsoap.org/soap/actor/next. Otherwise the actor attribute can be set to a URI that identifies a specific intermediary. Here is an example:
<?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Header> <TransactionId soap:mustUnderstand="1" actor="urn:TransactionCoordinator>123</TransactionId> </soap:Header> <soap:Body> <TransferFunds> <Source>804039836</Source> <Destination>804039836</Destination> <Amount>151.43</Amount> </GetWeather> </soap:Body> </soap:Envelope>
Because the TransactionId header element is intended for the transaction coordinator intermediary, its actor attribute is set to the intermediary’s URI. The mustUnderstand
attribute has also been set so that if the transaction coordinator intermediary does not understand the TransactionId header element, it must raise an error.
If the message is passed to another recipient, any header elements designated for the intermediary must be removed before the message is forwarded. The intermediary can, however, add additional header elements before forwarding the message to the next recipient. In this example, the transaction coordinator intermediary must remove the router element before forwarding it to the billing application.
One important point to note is that routing the message directly to the default actor is not considered an error. Setting the mustUnderstand attribute to 1 in combination with setting the actor attribute to urn:TransactionCoordinator does not ensure that the message will be routed through the intermediary. It means only that if the message does reach the
transaction coordinator intermediary, it must comprehend the TransactionId header entry or throw an error.
In the preceding example, the intermediary needs to perform a critical task before the
message is routed to the default actor. Recall that if the message does reach the transaction coordinator intermediary, it must remove the TransactionId header before forwarding the message. Therefore, the default actor can check to see whether the TransactionId header exists, which would indicate that the message was not passed through its appropriate intermediaries. However, determining whether all of the headers were processed after the message reached the default actor is not always ideal. What if the SOAP request needs to be routed through the intermediaries shown here?
The request to transfer funds must pass through a router intermediary before the funds are transferred. Suppose the router charges the customer a processing fee for forwarding the request to the appropriate banking Web service. However, before funds are deducted, the message should be routed through the transaction coordinator to initiate a transaction before any data is modified. Therefore the router intermediary and the default actor should perform all work in the scope of the transaction. Because the banking Web service is the default actor, it can check the headers to see whether the message was routed through the necessary intermediaries.
But what if the banking Web service discovers that the message was never routed through the transaction manager intermediary? If an error occurred during the funds transfer, you might not be able to undo the work performed by the router intermediary. Worse yet, the SOAP message might have been routed through the router intermediary before being routed through the transaction coordinator. If this is the case, there might be no way to tell that the procurement application performed its work outside the scope of the transaction.
Unfortunately, SOAP does not provide any mechanism to ensure that the message travels through all intended intermediaries in the proper order. In the “Futures” chapter, I will discuss one of the emerging protocols for addressing this problem.
The Body Element
A valid SOAP message must have one Body element. The body contains the payload of the message. There are no restrictions on how the body can be encoded. The message can be a simple string of characters, an encoded byte array, or XML. The only requirement is that the contents cannot have any characters that would invalidate the resulting XML document. The SOAP specification describes a method of encoding that can be used to serialize the data into the message’s body. It is a good idea to conform to an established encoding scheme such as this because it allows the sender to more easily interoperate with the recipient using a well-known set of serialization rules. (I describe this encoding method later in the chapter.)
SOAP messages can generally be placed into two categories: procedure- oriented messages and document-oriented messages. Procedure-oriented messages provide two- way communication and are commonly referred to as remote procedure call (RPC) messages. The body of an RPC message contains information about the requested action from the server and any input and output parameters. Document-oriented messages generally facilitate one-way communication. Business documents such as purchase orders are examples of document-oriented messages. Let’s take a closer look at each of these document types.
Two SOAP messages are paired together to facilitate an RPC method call with SOAP: the request message and the corresponding response message. Information about the targeted method along with any input parameters is passed to the server via a request message. The server then invokes some behavior on behalf of the client and returns the results and any return parameters. Most of the examples in this chapter relate to RPC method invocations, and they all follow the SOAP specification’s guidelines for encoding RPC messages. A business document such as a purchase order or an invoice can be encoded within the body of a SOAP message and routed to its intended recipient. The recipient of the document might or might not send an acknowledgment message back to the sender. (The “SOAP Encoding” section later in this chapter describes how to use serialization rules to encode the data contained within these business documents.) Because business documents often span across multiple companies, organizations such as BizTalk.org and RosettaNet serve as facilitators and repositories for schemas that define common document exchanges. Later in the book, I will describe how to leverage the .NET platform to create and consume both RPC and document -oriented messages.
Fault Element
Everything does not always go as planned. Sometimes the server will encounter an error while processing the client’s message. SOAP provides a standard way of communicating error messages back to the client.
Regardless of which encoding style was used to create the message, the SOAP
specification mandates the format for error reporting. The body of the message must contain a Fault element with the following structure:
<?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">