How to Use Assertions
32: Node n = nl.item(0);
33: System.out.println(“This node name is: “ + Æ n.getNodeName());
34: // get the NAME node of this ADDRESS node
35: Node nameNode = n.getFirstChild();
36: System.out.println(“This node name is: “ 37: + nameNode.getNodeName()); 38: } 39: } catch (Throwable t) 40: { 41: t.printStackTrace(); 42: } 43: } 44: } 45: Listing 8.2 BadDomLookup.java
The simple program, BadDomLookup, uses the Java API for XML Processing (JAXP) to parse the DOM (this example was tested with both Xerces and Sun’s default JAXP parser). After we get the W3C Document object, we retrieve a NodeListof ADDRESS
elements (line 26) and then look to get the first NAMEelement by accessing the first child under ADDRESS(line 35).
Upon executing Listing 8.2, we get
e:\classes\org\javapitfalls\>java org.javapitfalls ... BadDomLookup Æ myaddresses.xml
# of “ADDRESS” elements: 2 This node name is: ADDRESS This node name is: #text
The result clearly shows that the program fails to accomplish its task. Instead of an
ADDRESSnode, we get a text node. What happened? Unfortunately, the complexity of the DOM implementation is different from our simple conceptual model. The primary difference is that the DOM tree includes text nodes for what is called “ignorable white- space,” which is the whitespace (like a return) between tags. In our example, there is a text node between the ADDRESSand the first NAMEelement. The W3C XML specifica- tion states, “An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content.”3To visualize these whitespace nodes, Figure 8.1 displays all the DOM nodes
in myaddresses.xml in a JTree.
There are three solutions to this problem, and our rewrite of the program demon- strates two of them. Listing 8.3, GoodDomLookup.java, fixes the problem demonstrated above in two ways.
Figure 8.1 Display of all DOM nodes in myaddresses.xml.
3Extensible Markup Language (XML) 1.0(Second Edition). W3C recommendation; October 6, 2000;
001: package org.javapitfalls.item8; 002: 003: import javax.xml.parsers.*; 004: import java.io.*; 005: import org.w3c.dom.*; 006: 007: class DomUtil 008: {
009: public static boolean isBlank(String buf)
010: {
011: if (buf == null) 012: return false; 013:
014: int len = buf.length(); 015: for (int i=0; i < len; i++) 016: { 017: char c = buf.charAt(i); 018: if (!Character.isWhitespace(c)) 019: return false; 020: } 021: 022: return true; 023: } 024:
025: public static void normalizeDocument(Node n) 026: {
027: if (!n.hasChildNodes()) 028: return;
029:
030: NodeList nl = n.getChildNodes();
031: for (int i = 0; i < nl.getLength(); i++) 032: {
033: Node cn = nl.item(i);
034: if (cn.getNodeType() == Node.TEXT_NODE && 035: isBlank(cn.getNodeValue())) 036: { 037: n.removeChild(cn); 038: i--; 039: } 040: else 041: normalizeDocument(cn); 042: } 043: } 044:
045: public static Element getFirstChildElement(Element elem) 046: {
047: if (!elem.hasChildNodes()) 048: return null;
049:
050: for (Node cn = elem.getFirstChild(); cn != null; 051: cn = cn.getNextSibling()) 052: { 053: if (cn.getNodeType() == Node.ELEMENT_NODE) 054: return (Element) cn; 055: } 056: 057: return null; 058: } 059: } 060:
061: public class GoodDomLookup 062: {
063: public static void main(String args[]) 064: {
065: try 066: {
// ... command line check omitted for brevity ... 073:
074: DocumentBuilderFactory dbf =
075: DocumentBuilderFactory.newInstance(); 076: DocumentBuilder db = dbf.newDocumentBuilder(); 077: Document doc = db.parse(new File(args[0])); 078:
079: // get first Name of first Address
080: System.out.println(“Method #1: Skip Ignorable White Æ space...”);
081: NodeList nl = doc.getElementsByTagName(“ADDRESS”); 082: int count = nl.getLength();
083: System.out.println(“# of \”ADDRESS\” elements: “ + count); 084:
085: if (count > 0) 086: {
087: Node n = nl.item(0);
088: System.out.println(“This node name is: “ + Æ n.getNodeName());
089: // get the NAME node of this ADDRESS node
090: Node nameNode = Æ
DomUtil.getFirstChildElement((Element)n);
091: System.out.println(“This node name is: “ +
092: nameNode.getNodeName()); 093: }
094:
095: // get first Name of first Address
096: System.out.println(“Method #2: Normalize document...”);
097: DomUtil.normalizeDocument(doc.getDocumentElement());
098: // Below is exact code in BadDomLookup
099: nl = doc.getElementsByTagName(“ADDRESS”); 100: count = nl.getLength();
101: System.out.println(“# of \”ADDRESS\” elements: “ + count);
102:
103: if (count > 0) 104: {
105: Node n = nl.item(0);
106: System.out.println(“This node name is: “ + 107: n.getNodeName()); 108: // get the NAME node of this ADDRESS node
109: Node nameNode = n.getFirstChild();
110: System.out.println(“This node name is: “ + 111: nameNode.getNodeName()); 112: } 113: 114: } catch (Throwable t) 115: { 116: t.printStackTrace(); 117: } 118: } 119: } 120: Listing 8.3 (continued)
The key class in GoodDomLookup is the DomUtilclass that has three methods. Those three methods solve the DOM lookup problem in two ways. The first method is to retrieve the first child element (and not the first node) when performing a lookup. The implementation of the getFirstChildElement()method will skip any inter- mediate nodes that are not of type ELEMENT_NODE. The second approach to the prob- lem is to eliminate all “blank” text nodes from the document. While both solutions will work, the second approach may remove some whitespace not considered ignorable.
A run of GoodDomLookup.java gives us the following:
e:\classes\org\javapitfalls >java org.javapitfalls.item8.GoodDomLookup myaddresses.xml
Method #1: Skip Ignorable White space... # of “ADDRESS” elements: 2
This node name is: ADDRESS This node name is: NAME
Method #2: Normalize document... # of “ADDRESS” elements: 2 This node name is: ADDRESS This node name is: NAME
A better way to access nodes in a DOM tree is to use an XPath expression. XPath is a W3C standard for accessing nodes in a DOM tree. Standard API methods for evalu- ating XPath expressions are part of DOM Level 3. Currently, JAXP supports only DOM Level 2. To demonstrate how easy accessing nodes is via XPath, Listing 8.4 uses the DOM4J open source library (which includes XPath support) to perform the same task as GoodDomLookup.java. 01: package org.javapitfalls.item8; 02: 03: import javax.xml.parsers.*; 04: import java.io.*; 05: import org.w3c.dom.*; 06: import org.dom4j.*; 07: import org.dom4j.io.*; 08:
09: public class XpathLookup 10: {
11: public static void main(String args[]) 12: { 13: try 14: { 15: if (args.length < 1) 16: { 17: System.out.println(“USAGE: “ + 18: “org.javapitfalls.item8.BadDomLookup xmlfile”); 19: System.exit(1); 20: } 21: 22: DocumentBuilderFactory dbf = 23: DocumentBuilderFactory.newInstance(); 24: DocumentBuilder db = dbf.newDocumentBuilder();
25: org.w3c.dom.Document doc = db.parse(new File(args[0])); 26:
27: DOMReader dr = new DOMReader();
28: org.dom4j.Document xpDoc = dr.read(doc);
29: org.dom4j.Node node = xpDoc.selectSingleNode(
30: “/ADDRESS_BOOK/ADDRESS[1]/NAME”);
31: System.out.println(“Node name : “ + node.getName()); 32: System.out.println(“Node value: “ + node.getText()); 33: } catch (Exception e) 34: { 35: e.printStackTrace(); 36: } 37: } 38: } 39: Listing 8.4 XpathLookup.java
A run of XpathLookup.java on myaddresses.xml produces the following output:
E:\classes\org\javapitfalls>javaorg.javapitfalls.item8.XpathLookup Æ myaddresses.xml
Node name : NAME Node value: Joe Jones
The XpathLookup.java program uses the selectSingleNode()method in the DOM4J API with an XPath expression as its argument. The XPath recommendation can be viewed at http://www.w3.org/TR/xpath. It is important to understand that eval- uation of XPath expressions will be part of the org.w3c.dom API when DOM Level 3 is implemented by JAXP. In conclusion, when searching a DOM, remember to handle whitespace nodes, or better, use XPath to search the DOM, since its robust expression syntax allows very fine-grained access to one or more nodes.