• No results found

SAX parsing in Java

IM concepts and Jabber protocols

3.2 The session pool maintains client connections

3.3.5 SAX parsing in Java

The JabberInputHandlerXML parsing class coordinates the three main XML

parsing entities: the Packet class under construction, the Java SAX parser, and the PacketQueue. Of the three, only the parser remains to be tackled.

SAX parsing is a standard programming interface to XML parsers. SAX hides the details of XML parsing. Instead, interesting XML data is reported to the pro- grammer using XML content events. SAX programmers simply write a SAX con- tent handler class that responds to these content events. For Jabber parsing, we only need to handle the content events that correspond to the beginning and end of an element, and to character data within the element.

XML parsing subsystem 81

Figure 3.4 A SAX parser reads XML text, and issues corresponding SAX content events for handlers to process.

The JabberInputHandler is a SAX content handler class. Its primary task is coordi-

nating the building of packet objects from the SAX events generated by the Xerces

SAX parser. The JabberInputHandler accomplishes this in a fairly generic way by watching the depth of embedded elements within an XML document.

In order to build our Packet classes into a tree structure we will track the depth of a particular element within the XML document tree. The depth increases by one with every start element and decreases by one for each end ele- ment. This depth-first ordering of element tags makes tracking your current posi- tion in the tree relatively simple.

The actions for each SAX event are summarized here:11

startElement()—For the start of each new element, add one to the depth

counter and if:

[depth == 0]—The element should be a <stream:stream> tag. Instantiate

a special open stream Packet with name stream:stream. It doesn’t need an end element to be completed. Instead, it is immediately pushed onto the PacketQueue. This is the root of the Jabber stream.

11Note: SAX gurus will notice that my parser ignores processor instructions and does not validate the Jab-

ber XML against the Jabber DTDs. Processing instructions are not used in Jabber XML streams and validation is too resource-intensive to use in most Jabber servers.

<message>

<body>Hello there</body>

</message> SAX parser

start: message start: body char: Hello there end: body end: message

82 CHAPTER 3

IM concepts and Jabber protocols

[depth == 1]—Create a parentless Packet object. When we complete this

packet, we’ll push it onto the PacketQueue. The packet becomes the active packet under construction.

[depth > 1]—Create a Packet object and set its parent to the active packet.

The new packet becomes the active packet under construction.

characters()—Add the given String as a child to the active packet.endElement()—Subtract one from the depth counter and if:

[depth == 0]—The active packet should be a </stream:stream> tag

ending the Jabber XML stream. Instantiate a special close stream

Packet with element name /stream:stream and push it onto the

PacketQueue.

[depth == 1]—The active Jabber Packet is complete. Push the completed

Packet onto the PacketQueue.

[depth > 1]—The active packet’s parent Packet is still being built. Set the

active packet to the current active packet’s parent.

To process the XML stream, the JabberInputHandler needs to carry out the fol- lowing configuration steps:

Create SAX parser—We need to instantiate a parser object. You can do this in

a generic way using a SAX parser factory class or directly by specifying a par- ticular parser implementation class. The advantage of using the factory class is that you can easily plug in different SAX parser implementations

(e.g., replace Xerces) without changing code by simply setting system prop- erties. I directly create the Xerces SAX parser because I must set it up for

XML streaming.

Set up the parser—In this case, the parser is the JabberInputHandler class. We

also have to install a new reader factory into the Xerces parser so that it will incrementally parse the incoming XML.

Parse—Parsing is a simple matter of calling the parser’s parse() method,

handing it an InputSource. The SAX parser will parse the entire XML

stream, calling content handler methods as necessary, before returning from parse() method. The parse() call only returns when the stream has been closed or an uncaught exception is thrown.

XML parsing subsystem 83

PARSING XML STREAMS WITH SAX

Not all SAX XML parsers are created equal. Most assume you have a complete

XML document resulting in parsers that buffer incoming XML for greater effi- ciency. Unfortunately for us, these parsers refuse to parse XML stream data as it arrives: resulting in a “stuck” Jabber server.

The Xerces SAX parser allows us to override its buffering data reader with a streaming reader by creating and installing a custom reader factory. This factory produces reader objects that Xerces uses to read the XML.

NOTE If you are using another SAX parsing library you need to find out if it supports streaming (often called incremental XML parsing) and turn the feature on.

There is no standard way of telling a SAX parser you want to handle streaming data so you must consult your SAX library documentation for details. For Xerces, the fol- lowing class is all you need:

public class StreamingCharFactory extends DefaultReaderFactory {

public XMLEntityHandler.EntityReader createCharReader( XMLEntityHandler entityHandler, XMLErrorReporter errorReporter, boolean sendCharDataAsCharArray, Reader reader, StringPool stringPool) throws Exception {

return new StreamingCharReader(entityHandler, errorReporter,

sendCharDataAsCharArray, reader,

stringPool); }

public XMLEntityHandler.EntityReader createUTF8Reader( XMLEntityHandler entityHandler, XMLErrorReporter errorReporter, boolean sendCharDataAsCharArray, InputStream data, StringPool stringPool) throws Exception { XMLEntityHandler.EntityReader reader;

reader = new StreamingCharReader(entityHandler, errorReporter, sendCharDataAsCharArray, new InputStreamReader( data, "UTF8"),

84 CHAPTER 3

IM concepts and Jabber protocols

stringPool); return reader;

} }

Although the process of configuring the SAX parser may sound a bit complicated for a simple XML parser, it isn’t. In fact, I think the code speaks for itself. Let’s start by examining the constructor for the JabberInputHandler class.

The JabberInputHandler Constructor

public class JabberInputHandler extends DefaultHandler {

PacketQueue packetQ; Session session;

public JabberInputHandler(PacketQueue packetQueue) { packetQ = packetQueue;

}

The constructor allows us to set the PacketQueue once for the handler. I did this in anticipation of reusing JabberInputHandlers objects to process different XML

streams. The current server doesn’t take advantage of this.

Now for creating the parser, configuring it, and parsing our XML stream.

The JabberInputHandler process method starts the parsing work

public void process(Session session) throws IOException, SAXException {

//Directly create a Xerces SAXParser SAXParser parser = new SAXParser();

//Content handler for the SAX parser parser.setContentHandler(this);

//Handle streaming XML

parser.setReaderFactory(new StreamingCharFactory());

//Save the session this.session = session;

//Start the SAX parser parsing

parser.parse(new InputSource(session.getReader())); }

The process() method is the launch pad for starting the parsing process. There are two nonstandard things going on here. First, we create a Xerces SAXParser

XML parsing subsystem 85

SAXParser with our custom reader factory to support XML streaming. Second, we use the Session object to obtain a java.io.Reader object for the XML stream. By hiding the details of how the reader is created in the Session object, we provide a lot of flexibility for future changes without having to modify the JabberIn- putHandler class.

SERVER OPTIMIZATION

The JabberInputHandler is not designed for efficiency. Resources such as object instances are relatively expensive to create, store in memory, and garbage-collect. If you have a large-scale server, you may need to handle thousands of XML

streams at once. Rather than create a JabberInputHandler for each connection, you can share them between streams, only processing XML as it becomes avail- able. The JabberInputHandler unfortunately creates a new SAXParser every time it calls the process method. If you do plan on reusing the JabberIn- putHandler in high capacity servers, you should consider methods for reusing ex- pensive resources like SAXParser instances.

Now the only remaining part of the JabberInputHandler class is the event han- dlers. They are pretty straightforward now that you know how the PacketQueue

and Packet classes work.

Packet packet; int depth = 0;

public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException{ switch (depth++){ case 0: if (qName.equals("stream:stream")){

Packet openPacket = new Packet(null,qName,namespaceURI,atts); openPacket.setSession(session);

packetQ.push(openPacket); return;

}

throw new SAXException("Root element must be <stream:stream>");

12 SAXParser is imported with import org.apache.xerces.parsers.SAXParser to force the use of the

Xerces parser.

Listing 3.5 JabberInputHandler event handler methods

Active packet

being built XML element tree depth Only a <stream:stream> packet is allowed, throw exception otherwise

86 CHAPTER 3

IM concepts and Jabber protocols

case 1:

packet = new Packet(null,qName,namespaceURI,atts); packet.setSession(session);

break; default:

Packet child = new Packet(packet,qName,namespaceURI,atts); packet = child;

} }

public void characters(char[] ch, int start, int length) throws SAXException{ if (depth > 1){ packet.getChildren().add(new String(ch,start,length)); } }

public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws SAXException {

switch(--depth){ case 0:

Packet closePacket = new Packet("/stream:stream"); closePacket.setSession(session); packetQ.push(closePacket); break; case 1: packetQ.push(packet); break; default: packet = packet.getParent(); } } }

As you can see, event handling is so very simple with the help of other classes. I must emphasize, however, that this is not the quickest, or most efficient, way of handling XML parsing.

Only a new message, presence, or <iq> packet is allowed Add a child Packet We’re done; this should be a closing stream packet

Active packet done; push it onto the PacketQueue

Move back up the tree

Packet handling and server threads 87

For example, the majority of message packets enter the Jabber server and are immediately sent out to a Jabber client or server. There is no reason for the Jab- ber server to process the message contents, and no advantage for creating a Java object to represent that message. In fact, these short-lived Java objects can become a huge performance problem as we create and destroy thousands of these objects and the garbage collector struggles to keep up.

I took this approach because it is simple, and easier to understand. Once you start optimizing for performance, robustness, or scalability, you will be forced to add complexity and shortcuts that are hard to follow. I will leave these improvements to you as you experiment with and expand the Jabber server for your own uses.