• No results found

Implementing HTTP Caching

There are a number of open source implementations of HTTP gateway caches to con- sider when deploying a Ruby web service. Most gateway cache implementations are run as processes between the application servers and a load balancer such as HAProxy. How- ever, Rack-Cache is an interesting Ruby-based alternative that has emerged recently.

Rack-Cache

Rack-Cache is a gateway cache implemented as a Rack middleware. Ryan Tomayko wrote it in order to make it as easy as possible for Ruby developers to experiment with HTTP caching. It’s by far the most convenient gateway cache implementation to use when developing locally. To use Rack-Cache, you install the gem and insert it into a service’s middleware stack:

require 'service' require 'rack/cache' use Rack::Cache, :verbose => true, :metastore => 'memcached://localhost:11211/', :entitystore => 'memcached://localhost:11211/' run Service.new

Figure 8.10 A second request in validation-based HTTP caching.

Application Cache Service

Application Cache Service

generate ETagi Do not generate JSON 304 Not Modified 200 Ok ETag: “feed:47:v12” JSON data GET/Feeds/47 If-None-Match: “feed:47:v12” GET/Feeds/47

ptg The metastore and entitystore configuration options tell Rack-Cache where to

store the cache metadata and full response bodies, respectively. By default, Rack- Cache uses the heap, but this is just the sort of data that Memcached handles very well. This is all that’s necessary for Rack-Cache to start acting like a gateway cache. It supports both the expiration and validation caching models. The Rack-Cache source code is also an informative read for an interested Rubyist looking to better understand the mechanics of how gateway caches work.

Squid and Varnish

Out-of-process gateway caches such as Squid and Varnish are implemented in high- performance languages such as C and work with applications implemented in any lan- guage. They can be a bit more difficult to set up properly than Rack-Cache, but they also provide more performance than Rack-Cache in a production environment. By virtue of being around longer and supporting a larger number of users, they are also a bit more flexible and configurable than Rack-Cache. The details of configuring Squid and Varnish are beyond the scope of this book, but these tools are worth taking a look at for production once a service has HTTP caching logic built in and tested using Rack-Cache.

Conclusion

When it’s time to bring a service to production, latency and throughput are key con- cepts. Efficient load balancing enables a service to scale horizontally by adding addi- tional servers to increase throughput (and thus capacity). Caching, both within a process and externally via HTTP headers, is a proven technique to reduce latency and increase throughput while still leaving the underlying code clean and maintainable. As you scale a service, you should regularly test your latency and capacity to ensure that you can meet the requirements of your environment and the demand you expect.

ptg

Parsing XML for Legacy

Services

Sometimes an application must integrate with legacy services whose design is out of your control. It is common for these services to use XML formats such as SOAP, XML-RPC, or RESTful XML. This chapter covers some of the tools for parsing and working with XML-based services.

XML

Many older (and some newer) services use XML as their serialization format. While JSON is quickly becoming the preferred method of serialization due to its ease of use and simplicity, client libraries must occasionally work with XML. Libraries for parsing XML in Ruby include REXML, Nokogiri, Hpricot, LibXml Ruby, and SimpleXML.

The focus of this section is on parsing responses using REXML and Nokogiri. The reason for the choice of these two libraries is simple. REXML is included with Ruby, and Nokogiri currently has the best performance and is actively being devel- oped and supported. While the other XML libraries are usable, for libraries outside the standard REXML, Nokogiri is currently the leading option.

The services written so far in this book have been JSON based, so this section explores an example outside the services created in previous chapters. The Amazon

ptg EC2 Query API provides a real-world instance where parsing and requesting XML-

based services is useful. The next sections look at methods for parsing XML for one of the EC2 requests.

Amazon describes in the EC2 API documentation some common XML data types. These data types include multiple elements, which can be found in the API ref- erence (http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/). The examples for working with XML step through parsing the call to “describe instances.” This response gives information about virtual computer instances running in Ama- zon’s Elastic Compute Cloud:

<DescribeInstancesResponse xmlns="http://ec2.amazonaws.com/doc/2009-08-15/" <reservationSet> <item> <reservationId>r-44a5402d</reservationId> <ownerId>UYY3TLBUXIEON5NQVUUX6OMPWBZIQNFM </ownerId> <groupSet> <item> <groupId>default</groupId> </item> </groupSet> <instancesSet> <item> <instanceId>i-28a64341</instanceId> <imageId>ami-6ea54007</imageId> <instanceState> <code>0</code> <name>running</name> </instanceState> <privateDnsName>10-251-50-132.ec2.internal </privateDnsName> <dnsName>ec2-72-44-33-4.compute-1.amazonaws.com </dnsName> <keyName>example-key-name</keyName> <amiLaunchIndex>23</amiLaunchIndex> <productCodesSet>

ptg <item><productCode>774F4FF8</productCode> </item> </productCodesSet> <instanceType>m1.large</instanceType> <launchTime>2007-08-07T11:54:42.000Z </launchTime> <placement> <availabilityZone>us-east-1b </availabilityZone> </placement> <kernelId>aki-ba3adfd3</kernelId> <ramdiskId>ari-badbad00</ramdiskId> </item> <item> <instanceId>i-28a64435</instanceId> <imageId>ami-6ea54007</imageId> <instanceState> <code>0</code> <name>running</name> </instanceState> <privateDnsName>10-251-50-134.ec2.internal </privateDnsName> <dnsName>ec2-72-44-33-6.compute-1.amazonaws.com </dnsName> <keyName>example-key-name</keyName> <amiLaunchIndex>23</amiLaunchIndex> <productCodesSet> <item><productCode>774F4FF8</productCode> </item> </productCodesSet> <instanceType>m1.large</instanceType> <launchTime>2007-08-07T11:54:42.000Z </launchTime> <placement> <availabilityZone>us-east-1b </availabilityZone> </placement>

ptg <kernelId>aki-ba3adfd3</kernelId> <ramdiskId>ari-badbad00</ramdiskId> </item> </instancesSet> </item> </reservationSet> </DescribeInstancesResponse>

The entire response is wrapped in the DescribeInstanceResponse XML ele-

ment. Within that element is a reservationSet that contains the information for

instance reservations. Each instance reservation is contained within item elements.

These elements map together instances and their security groups.

The client library should parse out the reservation sets, each of which contains a reservation ID, an owner ID, a security group, and the collection of instances. Each of the instances contains an instance ID, an image ID, an instance state, a private DNS name, a DNS name, a key name, an instance type, and an availability zone. The parsing code should be able to pull out each of these elements.

The following examples don’t bother to parse every single element from the XML. In fact, to maintain a clear and usable API, the client should extract and expose only the data that is necessary.

REXML

Ruby versions 1.8 and newer include REXML (http://www.germane-software.com/ software/rexml/) as part of the standard library. It is a fully featured XML parsing library written in pure Ruby, with full support for XPath. XPath is a language for addressing elements within an XML document (see http://www.w3.org/TR/xpath). REXML also includes a simple API for traversing elements in a document. This exam- ple shows use of the basic REXML API.

The code for parsing the EC2 DescribeInstancesResponse can be broken down

into three classes: one class for the entire response, another class for a reservation set, and another class to store information about specific EC2 instances. The following code example shows the class for parsing the response to a describe instances request:

require 'rexml/document'

class DescribeInstancesResponse attr_reader :reservation_sets

ptg

def initialize(reservation_sets) @reservation_sets = reservation_sets end

def self.parse(xml_string)

doc = REXML::Document.new xml_string reservation_sets = [] doc.elements. each("DescribeInstancesResponse/reservationSet/item" ) do |element| reservation_sets << ReservationSet.from_xml(element) end new(reservation_sets) end end

The DescribeInstancesResponse class acts as the wrapper for the entire response

from EC2. The data that this class contains can be determined quickly by looking at the top of the class and noticing the attribute reader for reservation_sets. Other

than the initialization, this class contains only one method: a class method for parsing an XML string into an instance of this class.

The parse method first creates a REXML document. It then sets up the reservation sets by looping through the items in the reservationSet element. Each item element

is passed to a new class called ReservationSet, which is covered in a moment. Once

the reservation sets are built, a new instance of the DescribeInstancesResponse class

is returned from the parse method.

Having a separate class for the reservation set makes the code simpler to read and keep organized. Generally, it’s a good idea to create a class for each logical grouping of data that you parse. This is usually a collection of XML elements that are the children of a common ancestor. The class for wrapping reservation sets continues the building of objects that represent the response, shown here:

class ReservationSet

ptg def initialize(attributes) @security_group = attributes[:security_group] @reservation_id = attributes[:reservation_id] @instances = attributes[:instances] end def self.from_xml(xml) elements = xml.elements instances = [] elements.each("instancesSet/item") do |item| instances << Instance.from_xml(item) end new( :security_group => elements["groupSet/item/groupId"].text, :reservation_id => elements["reservationId"].text, :instances => instances) end end

As with the class for parsing the describe instances response, this class shows the data that is exposed through the attribute readers and the initialization method. The security group, the reservation ID, and the instances that represent the set are accessi- ble through this class.

The from_xml method takes an XML node. The use of the name from_xml rather

than parse is intentional, as this method expects an already parsed XML node object.

The method loops through the item elements in the instancesSet within this node.

Each of those nodes is passed to the Instance.from_xml method to create an instance

of Instance.

Finally, a new instance of ReservationSet is created. The constructor is passed

the instances previously built and the extracted text from the appropriate elements for the security group and the reservation ID.

The class for each instance is the last part of code necessary to parse all of the ele- ments required from the describe instances response:

class Instance

attr_reader :id, :image_id, :state, :private_dns_name, :dns_name, :key_name, :type, :launch_time, :availability_zone

ptg def initialize(attributes) @id = attributes[:id] @image_id = attributes[:image_id] @state = attributes[:state] @private_dns_name = attributes[:private_dns_name] @dns_name = attributes[:dns_name] @key_name = attributes[:key_name] @type = attributes[:type] @launch_time = attributes[:launch_time] @availability_zone = attributes[:availability_zone] end def self.from_xml(xml) elements = xml.elements new( :id => elements["instanceId"].text, :image_id => elements["imageId"].text, :state => elements["instanceState/name"].text, :private_dns_name => elements["privateDnsName"].text, :dns_name => elements["dnsName"].text, :key_name => elements["keyName"].text, :type => elements["instanceType"].text, :launch_time => elements["launchTime"].text, :availability_zone => elements["placement/availabilityZone"].text) end end

The Instance class holds most of the important data for the call to describe

instances. The attributes of the instance are all contained within this class. The attrib- ute readers and constructor continue the examples for the previous two classes, show- ing clearly what data the Instance class holds.

The from_xml method takes an XML node. It calls the constructor and pulls out

the required data from the elements within the passed-in node. One thing to notice about all three of these classes is that the XML parsing logic is all contained within a single public method. This makes it easier to get an immediate sense for where pars- ing occurs.

ptg The constructors of all three classes expect attribute hashes. This is useful when

creating test version of these objects later. It is much easier to pass in an attribute hash than sample XML.

Nokogiri

Nokogiri is an HTML and XML parser backed by the libxml and libxslt C libraries. It is compatible with versions of Ruby including 1.8, 1.9, JRuby, and Rubinius. Because of its use of underlying C libraries, Nokogiri is very fast. The results of bench- marks have varied, but Nokogiri has been shown to be consistently faster than other parsers, particularly REXML. Hpricot and LibXml-Ruby have parsing speeds that are either on par, a little faster, or sometimes slower. Your mileage may vary with each benchmark setup.

However, speed isn’t the only reason to use Nokogiri, as it also includes support for powerful CSS3 selectors. Selectors are patterns that match against elements in an HTML or XML document tree. In addition to CSS selectors, Nokogiri has built-in support for XPath and a few other methods for traversing the document tree.

To install Nokogiri, you must have the libxml2, libxml2-dev, libxslt, and libxslt- dev packages installed. Once these prerequisites have been set up, installation is as sim- ple as the following command:

gem install nokogiri

The following example using Nokogiri looks very similar to the REXML example:

require 'rubygems' require 'nokogiri' class DescribeInstancesResponse attr_reader :reservation_sets def initialize(reservation_sets) @reservation_sets = reservation_sets end def self.parse(xml) doc = Nokogiri::XML(xml)

ptg

sets = doc.css("reservationSet > item").map do |item| ReservationSet.from_xml(item)

end

new(sets) end

end

Because all parsing logic is contained within a single method in each class, those methods are the only ones that must be modified. The object interface looks identical whether you use Nokogiri or REXML. The start of the DescribeInstansesResponse

class is the same as for the REXML example. The attribute readers and constructor show what data this class stores. The parse method contains the real changes.

First, there is a call to create a Nokogiri document from the XML string. The reservation sets are then extracted from the document. This is an example of a CSS selector. In this case, the selector is looking for item elements that are direct children

of the reservationSet element. The selector returns a NodeSet that can be iterated

through. This makes for a slightly cleaner notation because of the use of map instead

of each, as in the REXML example. Each of the nodes is passed to the from_xml

method on the reservation set:

class ReservationSet

attr_reader :security_group, :instances, :reservation_id def initialize(attributes) @security_group = attributes[:security_group] @reservation_id = attributes[:reservation_id] @instances = attributes[:instances] end def self.from_xml(xml)

instances = xml.css("instancesSet > item").map do |item| Instance.from_xml(item)

end new(

ptg

:instances => instances,

:reservation_id => xml.css("reservationId").text) end

end

The ReservationSet class keeps the same structure as in the previous example.

The from_xml method contains the updates to this class.

First, the Instance objects are created by looping through the XML nodes with

a CSS selector. This selector looks for nodes named item that are direct children of

the instancesSet element. These nodes are then passed to the Instance constructor.

When the instances have been built, the security group and reservation ID are passed out and handed to the reservation set constructor:

class Instance

attr_reader :id, :image_id, :state, :private_dns_name, :dns_name, :key_name, :type, :launch_time, :availability_zone def initialize(attributes) @id = attributes[:id] @image_id = attributes[:image_id] @state = attributes[:state] @private_dns_name = attributes[:private_dns_name] @dns_name = attributes[:dns_name] @key_name = attributes[:key_name] @type = attributes[:type] @launch_time = attributes[:launch_time] @availability_zone = attributes[:availability_zone] end def self.from_xml(xml) new( :id => xml.css("instanceId").text, :image_id => xml.css("imageId").text,

:state => xml.css("instanceState > name").text, :private_dns_name => xml.css("privateDnsName").text,

ptg :dns_name => xml.css("dnsName").text, :key_name => xml.css("keyName").text, :type => xml.css("instanceType").text, :launch_time => xml.css("launchTime").text, :availability_zone => xml.css("availabilityZone").text) end end

The Instance class is no different from the three others. The attribute readers and

constructor look the same as before, while the changes in parsing are in the from_xml

method.

The XML parsing logic consists of single calls with various CSS selectors. Each of the attributes can be parsed with this simple logic. With regard to parsing the indi- vidual attributes, the CSS selectors don’t provide any particular advantage or disad- vantage over the XPath-based selectors.

When you use Nokogiri for parsing, the choice of whether to use XPath or CSS is mainly one of familiarity. While the XPath selectors have been shown to be slightly faster than CSS, both are still faster than using REXML. You should use whatever you’re most comfortable with.

SOAP

Simple Object Access Protocol (SOAP) is a specification for implementing web serv- ices using XML. Full coverage of SOAP is beyond the scope of this book. However, this section goes over a few basics and shows examples for working with SOAP-based services. There are multiple libraries for working with SOAP in Ruby. The most pop- ular libraries are soap4r (http://dev.ctor.org/soap4r), Savon (http://github.com/rubiii/ savon), and Handsoap (http://github.com/unwire/handsoap). The examples in this chapter focus on using Savon because of its popularity and ease of use. soap4r is the oldest of the bunch, but it is slower and difficult to use; Handsoap is a framework for writing SOAP clients.