• No results found

The Extensible Markup Language (XML)

In document ZEND PHP 5 Certification (Page 191-195)

XML is a subset of Standard Generalized Markup Language (SGML); its design goal is to be as powerful and flexible as SGML with less complexity. If you’ve ever worked with Hypertext Markup Language (HTML), then you’re familiar with an applica- tion of SGML. If you’ve ever worked with Extensible Hypertext Markup Language (XHTML), then you’re familiar with an application of XML, since XHTML is a refor- mulation of HTML 4 as XML.

i

It is not the scope of this book to provide a complete primer on XML. As such, we assume that you are familiar with the XML and XPath languages and their associated concepts.

In order to understand the concepts that follow in this chapter, it is important that you know some basic principles about XML and how to create well-formed and valid XML documents. In fact, it is now important to define a few terms before proceeding: • Entity: An entity is a named unit of storage. In XML, they can be used for a variety of purposes—such as providing convenient “variables” to hold data, or to represent characters that cannot normally be part of an XML document (for example, angular brackets and ampersand characters). Entity definitions can be either embedded directly in an XML document, or included from an external source.

Element: A data object that is part of an XML document. Elements can contain other elements or raw textual data, as well as feature zero or moreattributes.

Document Type Declaration: A set of instructions that describes the accepted structure and content of an XML file. Like entities, DTDs can either be exter- nally defined or embedded.

Well-formed: An XML document is considered well-formed when it contains a single root level element, all tags are opened and closed properly and all en- tities (<, >, &, ’,") are escaped properly. Specifically, it must conform to all “well-formedness” constraints as defined by the W3C XML recommendation. • Valid: An XML document is valid when it is both well-formed and obeys a

referenced DTD. An XML document can be well-formed and not valid, but it can never be valid and not well-formed.

A well-formed XML document can be as simple as:

<?xml version="1.0"?>

<message>Hello, World!</message>

This example conforms fully to the definition described earlier: it has at least one element, and that element is delimited by start and end tags. However, it is not valid, because it doesn’t reference a DTD. Here is an example of a valid version of the same document:

<?xml version="1.0"?>

<!DOCTYPE message SYSTEM "message.dtd"> <message>Hello, World!</message>

In this case, an external DTD is loaded from local storage, but the declarations may also be listed locally:

<?xml version="1.0"?> <!DOCTYPE message [

<!ELEMENT message (#PCDATA)> ]>

<message>Hello, World!</message>

In practice, most XML documents you work with will not contain a DTD—and, there- fore, will not be valid. In fact, the DTD is not a requirement except to validate the

174XML and Web Services

structure of a document, which may not even be a requirement for your particular needs. However, all XML documents must be well-formed for PHP’s XML function- ality to properly parse them, as XML itself is astrict language.

Creating an XML Document

Unless you are working with a DTD or XML Schema Definition (XSD), which provides an alternate method to describe a document, creating XML is a free-form process, without any rigid constraints except those that define a well-formed document. The names of tags, attributes, and the order in which they appear are all up to the creator of the XML document.

First and foremost, XML is a language that provides the means for describing data. Each tag and attribute should consist of a descriptive name for the data contained within it. For example, in XHTML, the <p> tag is used to describe paragraph data,

while the <td> tag describes table data and the <em> tag describes data that is to be emphasized. In the early days of HTML and text-based Web browsers, HTML tags were intended merely to describe data, but, as Web browsers became more sophisti- cated, HTML was used more for layout and display than as a markup language. For this reason, HTML was reformulated as an application of XML in the form of XHTML. While many continue to use XHTML as a layout language, its main purpose is to de- scribe types of data. Cascading style sheets (CSS) are now the preferred method for defining the layout of XHTML documents.

Since the purpose of XML is to describe data, it lends itself well to the transporta- tion of data between disparate systems. There is no need for any of the systems that are parties to a data exchange to share the same software packages, or encoding mechanisms, or byte order. As long as both systems know how to read and parse XML, they can talk. To understand how to create an XML document, we will be dis- cussing one such system that stores information about books. For the data, we have plucked five random books from our bookshelf. Here they are:

Title Author Publisher ISBN

The Moon Is a Harsh Mistress R. A. Heinlein Orb 0312863551 Fahrenheit 451 R. Bradbury Del Rey 0345342968 The Silmarillion J.R.R. Tolkien G Allen & Unwin 0048231398

1984 G. Orwell Signet 0451524934

Frankenstein M. Shelley Bedford 031219126X

Now, this data may be stored in any number of ways on our system. For this example, assume that it is stored in a database and that we want other systems to access it using using a Web service. As we’ll see later on, PHP will do most of the legwork for us.

From the table, it is clear what types of data need to be described. There are the title, author, publisher, and ISBN columns, each of which make up a book. So, these will form the basis of the names of the elements and attributes of the XML document. Keep in mind, though, that, while you are free to choose to name the elements and attributes of your XML data model, there are a few commonly-accepted XML data design guidelines to keep in mind.

One of the most frequently asked questions regarding the creation of an XML data model is when to use elements and when to use attributes. In truth, this doesn’t mat- ter. There is no rule in the W3C recommendation for what kinds of data should be encapsulated in elements or attributes. However, as a general design principle, it is best to use elements to express essential information intended for communication, while attributes can express information that is peripheral or helpful only to process the main communication. In short, elements contain data, while attributes contain metadata. Some refer to this as the “principle of core content.”

For representing the book data in XML, this design principle means that the au- thor, title, and publisher data form elements of the same name, while the ISBN, which we’ll consider peripheral data for the sake of this example, will be stored in an attribute. Thus, our elements are, as follows: book,title,author, andpublisher. The sole attribute of thebookelement isisbn. The XML representation of the book data is shown in the following listing:

<?xml version="1.0"?> <library>

176XML and Web Services <title>Fahrenheit 451</title> <author>R. Bradbury</author> <publisher>Del Rey</publisher> </book> <book isbn="0048231398"> <title>The Silmarillion</title> <author>J.R.R. Tolkien</author>

<publisher>G. Allen & Unwin</publisher> </book> <book isbn="0451524934"> <title>1984</title> <author>G. Orwell</author> <publisher>Signet</publisher> </book> <book isbn="031219126X"> <title>Frankenstein</title> <author>M. Shelley</author> <publisher>Bedford</publisher> </book> <book isbn="0312863551">

<title>The Moon Is a Harsh Mistress</title> <author>R. A. Heinlein</author>

<publisher>Orb</publisher> </book>

</library>

You’ll notice thatlibraryis the root element, but this might just as easily have been

books. What’s important is that it is the main container; all well-formed XML doc- uments must have a root element. Thelibraryelement contains all thebookele- ments. This list could contain any number of book elements by simply repeating it, this sample however contains all data necessary for the sample presented earlier.

In document ZEND PHP 5 Certification (Page 191-195)