Avoid Class Comparisons - XML Authoring - XML Best Practices Brett McLaughlin

Chapter 5. XML Best Practices Brett McLaughlin

5.1 XML Authoring

5.3.3 Avoid Class Comparisons

If you've worked with DOM, you know that one of the most common operations is tree walking. In fact, the last best practice showed a helper method to aid in this by walking a node's children to get its textual content. This tree walking is generally accomplished through the org.w3c.dom.Node interface, as all DOM

structures implement (actually, they extend, and your parser provides implementations of those interfaces) this base interface.

The problem is that there are several methods for determining a node's type, and then reacting to that type. Most Java developers familiar with polymorphism and inheritance would immediately use the methods

O’Reilly – Java Enterprise Best Practices １２５

provided around the Java Class class. Using that approach, you might end up with code such as that in Example 5-13.

Example 5-13. Using Java class comparison

NodeList children = rootNode.getChildNodes( );

// Figure out which node type you have and work with it. for (int i=0; i<children.getLength( ); i++) {

Node child = children.item(i);

if (child.getClass( ).equals(org.w3c.dom.Element.class)) { Element element = (Element)child;

// Do something with the element.

} else if (child.getClass( ).equals(org.w3c.dom.Text.class)) { Text text = (Text)child;

// Do something with the text node.

} else if (child.getClass( ).equals(org.w3c.dom.Comment.class)) { Comment comment = (Comment)child;

// Do something with the comment. } // etc . . .

}

In a similar vein, I've also seen code that looks similar to Example 5-14.

Example 5-14. Using string comparisons for class names

NodeList children = rootNode.getChildNodes( );

// Figure out which node type you have and work with it. for (int i=0; i<children.getLength( ); i++) {

Node child = children.item(i);

if (child.getClass( ).getName( ).equals("org.w3c.dom.Element")) { Element element = (Element)child;

// Do something with the element.

} else if (child.getClass( ).getName( ).equals("org.w3c.dom.Text")) { Text text = (Text)child;

// Do something with the text node.

} else if (child.getClass( ).getName( ).equals("org.w3c.dom.Comment")) { Comment comment = (Comment)child;

// Do something with the comment. } // etc . . .

O’Reilly – Java Enterprise Best Practices １２６

}

Before explaining why this doesn't work in relation to DOM, I should warn you that the second code fragment is a terrible idea. One of the slowest sets of operations within Java is String comparison; using the equals( ) method like this, over and over again, is a sure way to bog down your programs.

These might still look pretty innocuous, especially the first example. However, these code samples forget that DOM is a purely interface-based API. In other words, every concrete class in a DOM program is actually the implementation, provided by a parser project, of a DOM-standardized API. For example, you won't find in any

program a concrete class called org.w3c.dom.Element, org.w3c.dom.Comment, org.w3c.dom.Text,

or any other DOM construct. Instead, you will find classes such as

org.apache.xerces.dom.ElementNSImpl and org.apache.xerces.dom.CommentImpl . These classes are the actual implementations of the DOM interfaces.

The point here is that using the class-specific operations will always fail. You will inevitably be comparing a vendor's implementation class with a DOM interface (which is never a concrete class, can never be

instantiated, and will never be on the left side of an object comparison). Instead of these class operations, you need to use the instanceof operator, as shown in Example 5-15.

Example 5-15. Using the instanceof operator

NodeList children = rootNode.getChildNodes( );

// Figure out which node type you have and work with it. for (int i=0; i<children.getLength( ); i++) {

Node child = children.item(i);

if (child instanceof org.w3c.dom.Element) { Element element = (Element)child;

// Do something with the element.

} else if (child instanceof org.w3c.dom.Text) { Text text = (Text)child;

// Do something with the text node.

} else if (child instanceof org.w3c.dom.Comment) { Comment comment = (Comment)child;

// Do something with the comment. } // etc . . .

O’Reilly – Java Enterprise Best Practices １２７

Here, instanceof returns true if the class is the same as, is a subclass of, or is an implementation of the item on the righthand side of the equation.

Of course, you can also use the getNodeType( ) method on the org.w3c.dom.Node interface and perform integer comparisons, as shown in Example 5-16.

Example 5-16. Using integer comparisons

NodeList children = rootNode.getChildNodes( );

// Figure out which node type you have and work with it. for (int i=0; i<children.getLength( ); i++) {

Node child = children.item(i);

if (child.getNodeType( ) = = Node.ELEMENT_NODE) { Element element = (Element)child;

// Do something with the element.

} else if (child.getNodeType( ) = = Node.TEXT_NODE) { Text text = (Text)child;

// Do something with the text node.

} else if (child.getNodeType( ) = = Node.COMMENT_NODE) { Comment comment = (Comment)child;

// Do something with the comment. } // etc . . .

}

This turns out to be a more efficient way to do things. Comparison of numbers will always be a computer's strong suit. (You can also use a switch/case statement here to speed things up slightly.) Consider the case in

which you have an implementation class—for example, com.oreilly.dom.DeferredElementImpl. That

particular class extends com.oreilly.dom.NamespacedElementImpl, which extends

com.oreilly.dom.ElementImpl, which finally implements org.w3c.dom.Element. Using the

instanceof approach would cause the Java Virtual Machine (JVM) to perform four class comparisons and chase an inheritance tree, all in lieu of comparing a numerical constant such as "4" to another numerical constant. It should be pretty obvious, then, that getClass( ) doesn't work, instanceof works but performs poorly, and getNodeType( ) is the proper way to do node type discovery.

O’Reilly – Java Enterprise Best Practices １２８

5.4 JAXP

The next API on the list is one which readers should realize is not a parser, or even an API for parsing XML. JAXP is the Java API for XML Processing and is simply an abstraction layer, a thin shim that sits on top of the SAX and DOM APIs. JAXP performs no XML parsing itself, but instead defers this task to the underlying SAX and DOM APIs. The same thing is true for JAXP's XML transformation processing capabilities.

You should always attempt to use JAXP in a J2EE application. With the release of JAXP 1.1[6]_{and support for} SAX 2.0 and DOM Level 2, JAXP provides the baseline tool support required for solid Java and XML

programming. You'll be able to migrate applications more easily and change parser and processor vendors at will—with minimal impact on your applications. JAXP also has shown no adverse performance effects, so there is no reason to avoid using JAXP.

[6]_{JAXP 1.2 offers even more, such as support for XML schema. However, the SAX 2.0 and DOM Level 2}

compliance is of much greater import, so I recommend using JAXP 1.1 as a minimum requirement rather than Version 1.2.

At times, you will decide you need to use vendor-specific extensions to a parser or processor. In these cases, JAXP will obviously not suffice. However, I still

recommend using JAXP, except in the specific portions of your code that reference these vendor-specific features.

In document O Reilly Java Enterprise Best Practices (Page 124-128)