• No results found

Use Elements Sparingly, Attributes Excessively

Chapter 5. XML Best Practices Brett McLaughlin

5.1 XML Authoring

5.1.3 Use Elements Sparingly, Attributes Excessively

After giving you two recommendations about organization, I will now make what might seem like a counterintuitive suggestion: use elements infrequently and, instead, use attributes whenever possible. To get a better idea of what I'm talking about, take a look at the XML fragment in Example 5-3.

Example 5-3. An element-heavy document fragment

<person>

<firstName>Adam</firstName> <lastName>Duritz</lastName> <address type="home">

<street>102 Elizabeth Lane</street> <street>Apartment 23</street> <city>Los Angeles</city> <state>California</state> <zipCode>92013</zipCode> </address> </person>

To optimize this XML, you should try and convert as much as possible into attributes. The rule of thumb here is that any single-valued content can be turned into an attribute, while multivalued content must stay as elements. So, the firstName and lastName elements can be converted into attributes; each will always have only one value. Hence, the XML can be modified to look as follows:

<person firstName="Adam" lastName="Duritz"> <address type="home">

<street>102 Elizabeth Lane</street> <street>Apartment 23</street> <city>Los Angeles</city> <state>California</state> <zipCode>92013</zipCode> </address> </person>

The address element could not be converted to an attribute. First, it has its own content, and second, there could be multiple addresses for the same person (a home address, work address, and so forth). Within that element, you can perform the same checks: street is multivalued, so it stays as an element, but city, state, and zipCode are all single-valued, and can be moved to attributes:

O’Reilly – Java Enterprise Best Practices 109

<person firstName="Adam" lastName="Duritz">

<address type="home" city="Los Angeles" state="California" zipCode="92013"> <street>102 Elizabeth Lane</street>

<street>Apartment 23</street> </address>

</person>

To a lot of developers and content authors, this might look a bit odd. However, if you get into the habit of writing your XML in this fashion, it will soon seem completely natural. In fact, you'll soon look at XML with a wealth of elements as the odd bird.

Of course, I have yet to tell you why to perform this change; what is worth all this trouble? The reason behind this is in the way that SAX processes elements and attributes.

Some of you might be thinking that you don't want to use SAX, or that by using DOM or JAXP (or another API such as JAXB or SOAP), you'll get around this issue. However, it's unwise to assume that you will never need a specific API. In fact, almost all higher-level APIs such as DOM, SOAP, and JAXB use SAX at the lowest levels. So, while you might not think this practice affects your XML code, it almost certainly will.

Every time the SAX API processes an element, it invokes the startElement( ) callback, with the following signature:

public void startElement(String namespaceURI, String localName,

String qName, Attribute attributes) throws SAXException;

Typically, there is a great deal of decision-processing logic in this method, which goes something like this: if the element is named "this," perform some processing; if it is named "that," do some other processing; if it's named "something else," do something else again. Consequently, every invocation of this method tends to involve numerous string comparisons—which are not particularly fast—as well as several expression evaluations (e.g., if/then/else, etc.).

In addition, for every startElement( ) call, there is an accompanying endElement( ) call. So, if you processed the first XML fragment earlier in the chapter, you would suddenly find yourself staring at the lengthy list of method calls shown in Example 5-4. And that's without even looking at invocations of characters( ) and the like within each element!

O’Reilly – Java Enterprise Best Practices 110

Example 5-4. Element-heavy SAX processing

startElement( ) // "person" startElement( ) // "firstName" endElement( ) // "firstName" startElement( ) // "lastName" endElement( ) // "lastName" startElement( ) // "address

startElement( ) // "street" (1st one) endElement( ) // "street" (1st one) startElement( ) // "street" (2nd one) endElement( ) // "street" (2nd one) startElement( ) // "city" endElement( ) // "city" startElement( ) // "state" endElement( ) // "state" startElement( ) // "zipCode" endElement( ) // "zipCode" endElement( ) // "address" endElement( ) // "person"

That is a lot of processing time! However, with each invocation, the attributes for the element are passed along. This means there is no difference in processing time between an element with several attributes and an element with just one attribute. So, as I mentioned earlier, decreasing the number of single-value elements and instead loading them as attributes onto an element can drastically decrease the parsing time. Revisiting Example 5-3 and converting most of the elements to attributes, the long list of method calls in Example 5-4 comes out much shorter, as shown in Example 5-5.

Example 5-5. Element-light SAX processing

startElement( ) // "person" startElement( ) // "address"

startElement( ) // "street" (1st one) endElement( ) // "street" (1st one) startElement( ) // "street" (2nd one) endElement( ) // "street" (2nd one) endElement( ) // "address"

O’Reilly – Java Enterprise Best Practices 111

Eighteen method calls became eight—a change of over 50%.[2] Add to that the reduction in

decision-processing logic in the startElement( ) method because there are fewer elements, and the reduction in characters( ) callback invocations, and this is clearly a good practice to follow.

[2] This ignores the work to parse the attributes, which may reduce it from 50%.

.2 SAX

At the base of nearly all Java and XML APIs is SAX, the Simple API for XML. The first part of making good decisions with SAX is deciding whether to use SAX. Generally, alpha-geek types want to use SAX and nothing else, while everyone else avoids it like the plague. The mystique of using SAX and the complexity that makes it daunting are both poor reasons to decide for or against using SAX. Better criteria are presented in the following questions:

• Am I only reading and not writing or outputting XML? • Is speed my primary concern (over usability, for example)? • Do I need to work with only portions of the input XML?

• Are elements and attributes in the input XML independent (no one part of the document depends on or references another part of the document)?

If you can answer "yes" to all these questions, SAX is well-suited for your application. If you cannot, you might want to think about using DOM, as detailed later in this chapter.