• No results found

The Structure of XML Documents

XML documents must be properly structured and follow strict syntax rules in order to work correctly. If a document is lacking in either if these areas, the document can’t be parsed. There are two types of structures in every XML document: logical and physical. The logical structure is the framework for the document and the physical structure is the actual data.

An XML document may consist of three logical parts: a prolog (optional), a document ele- ment, and an epilog (optional). The prolog is used to instruct the parser how to interpret the document element. The purpose of the epilog is to provide information pertaining to the preceding data. Listing 6-1 shows the basic structure of an XML document.

Listing 6-1 Basic structure of an XML document

<?xml version=”1.0” ?> <!-- Above is the prolog -->

<!-- The lines below are contained within the document element: BANDS --> <BANDS>

<BAND TYPE=”ROCK”>

<NAME>Hootie And The Blowfish</NAME> <MEMBERS> <MEMBER> <FIRST_NAME>Darius</FIRST_NAME> <LAST_NAME>Rucker</LAST_NAME> </MEMBER> <MEMBER> <FIRST_NAME>Dean</FIRST_NAME> <LAST_NAME>Felber</LAST_NAME> </MEMBER> <MEMBER> <FIRST_NAME>Mark</FIRST_NAME> <LAST_NAME>Bryan</LAST_NAME> </MEMBER> <MEMBER> <FIRST_NAME>Jim</FIRST_NAME> <LAST_NAME>Sonefeld</LAST_NAME> </MEMBER> </MEMBERS>

<LABEL>Atlantic Recording Corporation</LABEL> </BAND>

</BANDS>

<!-- epilog goes here -->

The prolog is made up of two parts: the XML declaration and an optional Document Type Declaration (DTD). The XML declaration identifies the document as XML and lets the parser know that it complies with the XML specification. Although the prolog, and thereby the XML declaration, is optional, we recommend that you include them in all your XML docu- ments. Here is an example of a simple XML declaration:

<?xml version=”1.0” ?>

The XML declaration can also contain more than just the versionattribute. Some of the more important ones are the encodingand standaloneattributes.

The document type declaration establishes the grammar rules for the document or it points to a document where these rules can be found. The DTD is optional, but, if included, must appear after the XML declaration.

XML documents can also reference a Schema rather than a DTD. Schemas perform essen- tially the same function as DTDs, but can describe more complex data types and are actually XML documents themselves. When possible, we recommend using a Schema rather than a DTD as Schemas are quickly becoming the de-facto standard for describing XML documents.

An XML document is referred to as well formed when it conforms to all XML syntax rules. A valid XML document follows the structural rules defined in a Document Type Definition or Schema.

All the data in an XML document is contained within the document element (in this example, <BANDS>). You can’t have more than one document element in the same document, but the document element can contain as many child elements as necessary.

XML Syntax

The contents of an XML document are constructed using a very strict syntax that must con- form to the following rules:

Tags are case sensitive.

All tags must be closed.

Attribute values must be enclosed in quotes.

XML elements can have attributes that allow you to add information to an element that it does not contain. For example, in Listing 6-1, the BAND

element has a TYPEattribute with a value of “ROCK”.

XML tags are very similar to HTML tags. The less-than (<) and greater-than (>) symbols are used to delimit tags and the forward slash (/) is used to indicate closing tags.

Elements are building blocks of an XML document. Every element in an XML document, with the exception of the document element, is a child element. Child elements can contain one of four content types:

Element content

Character content

Mixed content

Empty

In our example, the <BAND>and <MEMBERS>elements contain element content. All others contain character content.

All elements in an XML document are nested, which gives the document a hierarchical tree appearance. If you’ll notice in the <BANDS>example, all of elements’ sub-elements are indented. The rules for nesting are strictly enforced.

XML elements can also have attributes. For example:

<BAND TYPE=”ROCK”>

In the previous example, the <BAND>element has an attribute named TYPEthat is used to indicate what kind of music the band plays. Notice that the attribute value is enclosed in

Note Note

quotes, which are required. You can create attributes to help describe your elements. You could have also used another child element called <TYPE>rather than using an attribute. Either way is fine. It’s really a matter of preference.