• No results found

Essential Guide XML PDF

N/A
N/A
Protected

Academic year: 2021

Share "Essential Guide XML PDF"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

BY SHARON L. HOFFMAN — AUGUST 2005

BY SHARON L. HOFFMAN — AUGUST 2005

XML

XML

XML

XML

is a key technology for sharing datais a key technology for sharing data

between business entities because it

between business entities because it

bridges different ways of storing and

bridges different ways of storing and

referencing data. Although XML can be described as a

referencing data. Although XML can be described as a

language, the extensible nature of XML means that it’s

language, the extensible nature of XML means that it’s

more correctly classified as a standard.

more correctly classified as a standard.

Many interrelated standards (for a list, see “Essential

Many interrelated standards (for a list, see “Essential

XML Standards” on page 4)

XML Standards” on page 4) complement XML and expandcomplement XML and expand

its capabilities. XML is also a fundamental building block

its capabilities. XML is also a fundamental building block

for other standards. For example, many Web-services

for other standards. For example, many Web-services

standards, such as Simple Object Access Protocol (SOAP)

standards, such as Simple Object Access Protocol (SOAP)

and Web Services Description Language (WSDL), are based

and Web Services Description Language (WSDL), are based

on XML. To give you a sense of how you might use XML

on XML. To give you a sense of how you might use XML

in your own applications, let’s start with a quick look at

in your own applications, let’s start with a quick look at

XML syntax and how XML compares with languages

XML syntax and how XML compares with languages

used for related tasks.

used for related tasks.

XML in Context

XML in Context

An XML document is made up of XML elements. Each

An XML document is made up of XML elements. Each

element contains a starting tag, an ending tag, and (usually)

element contains a starting tag, an ending tag, and (usually)

data nested between the two tags. By choosing descriptive

data nested between the two tags. By choosing descriptive

names for elements, you can make your XML documents

names for elements, you can make your XML documents

more human-readable and therefore self-documenting. In

more human-readable and therefore self-documenting. In

Figure 1, the highlighted line is a single element called

Figure 1, the highlighted line is a single element called

product_code. If a document contains more than one element

product_code. If a document contains more than one element

of the same type, the tags will be repeated for each element

of the same type, the tags will be repeated for each element

as shown for the product_code and requested_qty elements

as shown for the product_code and requested_qty elements

in Figure 1. For more information about XML syntax see

in Figure 1. For more information about XML syntax see

“Essential XML Syntax and

“Essential XML Syntax and TTerminology” on page 3.erminology” on page 3.

Repeating the data description for every element means

Repeating the data description for every element means

that XML documents are entirely self-contained — you

that XML documents are entirely self-contained — you

won’t need to refer to a database layout, for example.

won’t need to refer to a database layout, for example.

However

However, the overhead , the overhead of repeating allof repeating all

the

the element-descelement-description informationription information

quickly becomes unwieldy. As a result,

quickly becomes unwieldy. As a result,

most developers prefer using

most developers prefer using

data-description languages (e.g., SQL, DDS)

description languages (e.g., SQL, DDS)

to define databases. However, XML shines

to define databases. However, XML shines

in data-transfer applications that involve

in data-transfer applications that involve

relatively small amounts of data (these are

relatively small amounts of data (these are

typically single transactions such as an

typically single transactions such as an

inventory inquiry or a purchase order).

inventory inquiry or a purchase order).

Data transfer is by far the most common

Data transfer is by far the most common

XML application in iSeries environments.

XML application in iSeries environments.

However

However, you can , you can also use XML to also use XML to addadd

Figure 1:

Figure 1:Sample XML Sample XML documentdocument

<?xml version="1.0" encoding="UTF-8"?>

<?xml version="1.0" encoding="UTF-8"?>

<inventory_inquiry>

<inventory_inquiry>

<customer_reference>bike component availability</customer_reference>

<customer_reference>bike component availability</customer_reference>

<date_required>9/1/2005</date_required> <date_required>9/1/2005</date_required> <customer> <customer> <customer_name>Acme Company</customer_name> <customer_name>Acme Company</customer_name> <contact_name>Sharon Hoffman</contact_name> <contact_name>Sharon Hoffman</contact_name> <contact_email>[email protected]</contact_email> <contact_email>[email protected]</contact_email> </customer> </customer> <requested_products> <requested_products> <product_code>12345</product_code> <product_code>12345</product_code> <requested_qty>5</requested_qty> <requested_qty>5</requested_qty> <product_code>67892</product_code> <product_code>67892</product_code> <requested_qty>25</requested_qty> <requested_qty>25</requested_qty> </requested_products> </requested_products>

(2)
(3)

1

XML is case sensitive.

2

Generally, white space (e.g., indents, blank lines) in an XML document is ignored.

3

You can choose any element names you like as long as they conform to a few basic rules: • Element names cannot contain spaces. • Element names must begin with a letter or

an underline.

• After the first character, element names can contain numbers, hyphens, periods, colons, letters, and underscores. (Colons are usually avoided in element names because they have special meaning within XML.)

• Element names cannot begin with the letters xml, regardless of case (i.e., xml, XML, xMl, and Xml are all invalid).

4

Elements can contain one or more attributes. In many cases, the XML designer may choose whether to use elements or attributes to define a particular structure. As a rule of thumb, attributes should be used for

information that is not integral to the element.

5

An element cannot contain more than one attribute with the same name.

6

Both starting and ending tags are required for all elements except empty elements. Empty elements occur most often when an element is completely defined by its attributes.

7

Elements must be properly nested (i.e., once an inner element tag is opened, it must be closed before any outer tags).

The following nesting is correct:

<customer_name>

<first_name>Sharon</first_name> <last_name>Hoffman</last_name> </customer_name>

The following nesting is syntactically correct, although it doesn’t make much sense:

<customer_name> <first_name>Sharon

The following nesting is syntactically incorrect:

<customer_name> <first_name>Sharon <last_name>Hoffman </first_name> </last_name> </customer_name>

8

The outermost element in any XML document is referred to as the root element.

9

The root element may be preceded by a document declaration and processing instructions.

10

Built-in XML entities are used to include a character that has special meaning in XML (e.g., a greater-than sign) within XML content. You can also define additional entities as short-hand for text and structures that you use repeatedly.

11

An XML document that has correct syntax is well formed.

12

An XML document that conforms to the structure defined by its Document Type Definition (DTD) or schema is valid. It is possible for an XML document to be well formed but invalid, but the reverse is not possible.

tool for organizing information and improving search capabilities. To understand the benefits of an XML-encoded document, you should consider the differences between XML and HTML.

Although the two languages are syntactically similar because they have the same antecedents (see “Essential XML History” on page 5 for information), they have different strengths. HTML is best used to format information for display, while the descriptive information in XML tags makes it easier to deal with document content. For example, suppose you have a document containing a list of PC printers that contains information about the features of each printer model. If the document is stored in HTML, it’s difficult to create a search that finds all printers that support color printing, duplex printing, and can print at least 10 pages per minute. Conversely, if you store the same document using XML, you would probably create separate elements for each important feature (e.g., maximum_print_speed) and could easily develop an application that searches for all printers that meet your criteria. Of course, a database

is ideal for such a search, but XML provides database-like search capabilities for information that is stored in documents such as user manuals or marketing brochures. As you’ll see in the following section, the XML data can easily be converted into HTML for display purposes.

Because XML documents are plain text, you can write XML using any text editor (e.g., Notepad). However, as you begin working with XML, you’ll quickly find that an XML-aware editor is a big time-saver. An XML editor should help you write XML by providing syntax-checking and document-generation capabilities. For example, if you begin to create a new element, some editors will automatically generate the ending tag for you.

An XML document can stand entirely on its own, without any related documents. More often, though, an XML document is part of a larger application architecture that includes components that define the structure required for a particular type of XML document, solutions that reformat XML data (e.g., create an HTML document for display using data from an XML document), and applications that process

(4)

ESSENTIAL XML STANDARDS

XLINK is a standard for defining hyperlinks in XML.

XML Namespaces make it possible to create unique

element names.

XML Schemas define the rules for the specialized

XML documents used to define the structure of  other XML documents.

XPATH addresses each part of an XML document

via a hierarchical structure (e.g., first_name within customer_name within quote_request).

XQUERY is a relatively new standard that provides

SQL-like query capabilities for XML documents.

Extensible Stylesheet Language (XSL) formats

XML documents for display. There are two components of the XSL standard: XSL Transformations (XSLT) and XSL Formatting Objects (XSL FO).

XML itself is a standard, but it also involves many related standards. Here are

some of the most widely used XML standards.

XML documents. Understanding how these pieces work together is vital to understanding XML.

The Big Picture

An XML document is almost always associated with a second document that defines the valid structure for a particular type of documents. For example, an XML document might contain a particular inventory inquiry from XYZ Company, but the structural-definition document would define the format for all inventory inquiry documents. There are two standards for these structural-definition documents: DTD is the older and simpler standard, whereas XML schema is the newer standard. DTDs and schemas serve the same purpose, but their complexity and capabilities vary significantly.

Figure 2 contains a DTD that you could use to define the XML document in Figure 1, and Figure 3 contains the schema for the same document. Both the DTD and the schema were generated using an XML editor (WebSphere Development Studio Client for iSeries — WDSc, in this case). You’ll find that creating a sample document (e.g., an inventory inquiry) and using it to generate an initial version of the DTD or schema is often the simplest way to create a structural-definition document. While you may need to clean up the generated code, it will give you a good starting point for

developing the DTD or schema.

Whether you use a DTD or a schema, there is typically a one-to-many relationship between the DTD or schema and the XML documents. For example, you could publish a DTD or a schema (or both) specifying the format for incoming inventory inquiries and, hopefully, many of your cus-tomers would then begin to send you inventory inquiries in XML format. DTDs and schemas for external documents (versus documents that are inter-nal to a particular company) are usually published online so that they can be shared more easily.

Ideally, everybody would use the same structure for the same type of document (e.g., inventory inquiries), but that’s not always the case — not even within a single industry. Fortunately, many industry groups are working on standards that should help alleviate some of the Tower-of-Babel aspects of XML. You’ll find the latest information on industry-specific XML structures online at xml.org.

In addition to DTDs and schemas, other components can be associated with XML documents. For example, if you plan to display an XML document in a Web page, you’ll probably want to first convert the XML document into an HTML document. Similarly, you often might need t o create multiple XML documents that contain the same general information but use slightly different structures.

If you need to convert lots of documents between the same two structures, it makes sense to automate the process. The simplest way to do this is via an Extensible Stylesheet Language Transformations (XSLT) document that defines how input elements should be formatted in the output (XML or HTML) document. For example, if several of your vendors accept inventory inquiries in XML, but each uses a slightly different schema, you could develop a generic XML inventory inquiry, then create the variations using XSLT. As with DTDs and schemas, your XML editor should include tools to help you create XSLT documents.

An XSLT document works in conjunction with an XSLT Figure 2:A DTD generated by WDSc

for the XML document in Figure 1

<?xml version=’1.0’ encoding="UTF-8"?> <!ELEMENT contact_email (#PCDATA)> <!ELEMENT contact_name (#PCDATA)>

<!ELEMENT customer (customer_name,contact_name,contact_email)> <!ELEMENT customer_name (#PCDATA)>

<!ELEMENT customer_reference (#PCDATA)> <!ELEMENT date_required (#PCDATA)> <!ELEMENT inventory_inquiry

(customer_reference,date_required,customer,requested_products)> <!ELEMENT product_code (#PCDATA)>

<!ELEMENT requested_products ((product_code,requested_qty)+)> <!ELEMENT requested_qty (#PCDATA)>

(5)

Although most XML editors include an XML parser, you’ll also need an XML parser for production applications. XML parsers may be part of a Web application server, or they may be available as separate software options. There are two general standards for XML parsers: Document Object Model (DOM) and Simple API for XML (SAX).

The only functional difference between DOM parsers and SAX parsers is that DOM parsers can modify an XML document, while SAX parsers are read-only (of  course, an application that uses a SAX parser can always write out a new XML document in a different format than the incoming XML document). The other differ-ences between DOM and SAX parsers don’t affect their capabilities, but they can have an impact on ease-of-use, and in some cases, performance.

SAX parsers are event-driven and are best suited for applications that need to choose specific elements from a larger XML document. You’ll find the SAX parsers more intuitive if your programming background includes languages that have event-driven capabilities (e.g., Visual Basic, Java).

DOM parsers read an entire XML document into an application where the elements can be referenced, much as an RPG program might reference fields in a record format. Therefore, DOM parsers have an advantage over SAX parsers when you need to process a high percentage of the elements in an XML document. In addition,

DOM parsers generally feel more natural than SAX parsers if your programming background includes procedural languages such as RPG and Cobol.

Essential XML History

The Essential XML Resources

The histories of individual computer languages are mostly just curiosities, but XML’s history provides a glimpse into its syntax as well. XML is part of the same family of languages as HTML and is based on Standard Generalized Markup Language (SGML). SGML is a direct descendent of Generalized Markup Language, which was developed by IBM

researchers in the 1960s.

The concept behind markup languages is to separate document content from document struc ture and display. Thus in both XML and HTML, the tags contain information about data — formatting information in HTML, and context information in XML.

SGML became an ISO standard in 1986. HTML, which evolved somewhat independently but incorporates many SGML concepts, is slowly being brought back into compliance with the larger SGML standard.

In 1996, developers began working on a simplified version of SGML that focuses on document structure rather than document format. That project is the basis for XML, which became a Worldwide Web Consortium standard in 1998.

ESSENTIAL XML PARSER CONCEPTS

Charles F. Goldfarb’s All the XML Books in Print

Goldfarb, one of the developers of SGML, attempted to list all the XML books in print. Although the list was last updated in early 2004, it’s still a useful resource.

xmlbooks.com

The CoverPages

The XML CoverPages include XML news, background material, and technical tips.

xml.coverpages.org

DevX.com

XML FAQs, articles, discussion groups and more.

devx.com/xml

World Wide Web Consortium XML page

w3.org/XML

XML.com

O’Reilly Media, Inc., a premier te chnical book publisher, maintains this XML information site.

xml.com

IBM RESOURCES

Developerworks XML site

www-106.ibm.com/developerworks/xml iSeries XML information home page

www-1.ibm.com/servers/enable/site/xml/iseries/index.html

Two IBM white papers illustrate how to process

XML documents using RPG or Cobol:

“Parsing XML documents using the new

V5R3 ILE COBOL syntax”

www-1.ibm.com/servers/enable/site/education/abstracts/3db2_abs.html

“XML Interface for RPG maps XML

into DB2 UDB for iSeries”

(6)

processor — software that applies the rules defined in the XSLT document to an incoming XML document and pro-duces an output document in HTML, XML, or text format. An XSLT processor is typically bundled into a Web appli-cation server such as WebSphere Appliappli-cation Server (WAS) and can be accessed by calling APIs in an application. Most XML editors also include an XSLT processor for testing purposes.

From XML to the Database

and Vice-Versa

In an iSeries environment, XML projects almost invariably involve extracting data from DB2 UDB for iSeries or moving data from XML documents into the database. While it’s possible to store entire XML documents in iSeries files, more often you’ll need to separate the data for one or more elements from its tags and store the data itself as a field or fields within existing iSeries database records. You’ll also find lots of requirements for

the opposite task — creating XML documents using data from one or more database records.

The underlying software that is used to separate an XML document into data and data-description components is an XML parser. An XML parser understands the rules of XML syntax, just as the parser that is part of the RPG compiler understands RPG syntax. For more about XML

parsers, see “Essential XML Parser Concepts” on page 5. As you begin developing in XML, you might not even realize that you’re using an XML parser. For example, when an XML editor validates an XML document against its associated DTD or schema, an XML parser is invoked to perform the validation. XML parsers, including those for iSeries, are typically free. The iSeries-specific XML parser support is packaged in the no-charge licensed program

product, XML Toolkit for iSeries (5733-XT1). If you’re working with very low document

volumes, it may be possible to assemble and disassemble XML documents using the tools

built into an XML editor. However, for production processing of XML documents,

you’ll usually need to develop code that moves data back and forth between a par-ticular type of XML document (e.g., an inventory inquiry) and the associated data-base records.

You can create an XML document using a variety of techniques. At one end of the spectrum, you could write an RPG program that creates an XML document as an iSeries database file by hand-coding the tags and their contents. Then, you could convert the database file to a stream file using the CPYTOSTMF (Copy to Stream File) CL command. Other options include using APIs to output a stream file from an RPG program, generating an XML document using the results of an SQL query, or writing a Java application that builds an XML document.

Although you can write custom code to extract data from an XML document, it’s simpler to leverage the capabilities of an XML parser. For example, you might write code that invokes specific parser functions such as reading the data for a particular type of element (e.g., product_code).

 Java is the language of choice for working with XML because it includes extensive support for accessing parser APIs. However, you can also invoke parser APIs using RPG or Cobol, and products are available that will auto-mate part of the process of assembling or disassembling XML documents.

Explore XML

XML is a powerful tool for communicating data between applications using different databases and running on different platforms, and it is rapidly becoming the medium of choice for transaction-level data transfer. XML can also organize infor-mation within a document, thus making it easier to modify and search large amounts of text. For all its strengths, XML is still a relatively new technology with a maze of confusing, and sometimes competing, standards. To take advantage of  XML, it helps to have a clearly defined goal and the flexi-bility to experiment with various tools and techniques. It’s also useful to understand how other businesses are using XML. To explore the opportunities XML offers, visit the Web sites listed in “Essential XML Resources” on page 5.■

Sharon L. Hoffman is a senior technical editor for iSeries NEWS.

Figure 3:An XML schema generated by WDSc for the XML document in Figure 1

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="contact_email" type="xsd:string"/> <xsd:element name="contact_name" type="xsd:string"/> <xsd:element name="customer"> <xsd:complexType> <xsd:sequence> <xsd:element ref="customer_name"/> <xsd:element ref="contact_name"/> <xsd:element ref="contact_email"/> </xsd:sequence> </xsd:complexType> </xsd:element>

<xsd:element name="customer_name" type="xsd:string"/> <xsd:element name="customer_reference" type="xsd:string"/> <xsd:element name="date_required" type="xsd:string"/> <xsd:element name="inventory_inquiry"> <xsd:complexType> <xsd:sequence> <xsd:element ref="customer_reference"/> <xsd:element ref="date_required"/> <xsd:element ref="customer"/> <xsd:element ref="requested_products"/> </xsd:sequence> </xsd:complexType> </xsd:element>

<xsd:element name="product_code" type="xsd:string"/> <xsd:element name="requested_products">

<xsd:complexType>

<xsd:sequence maxOccurs="unbounded" minOccurs="1"> <xsd:element ref="product_code"/>

<xsd:element ref="requested_qty"/> </xsd:sequence>

</xsd:complexType> </xsd:element>

<xsd:element name="requested_qty" type="xsd:string"/> </xsd:schema>

(7)

References

Related documents

Differences between an XML Schema Definition XSD and Document Type Definition DTD include XML schemas are crap in XML while DTD are derived from SGML syntax XML schemas define

We have chosen the I.C number of each students of 2DAA as we choose to collect a numeric data for the tasks. A numerical data is also known as quantitative data which consists

This paper reviews the epidemiological and clinical data on how medications (non-steroidal anti- inflammatory drugs, estrogens and antibiotics), lifestyle factors

Select which of the following correctly describes a valid XML document with respect to &#34;XML Schema

HB 2060 – family group decision making program – DO PASS AMENDED/STRIKE EVERYTHING Kody Kelleher, Senate Assistant Research Analyst, explained HB 2060 and the 3-page Gray

The rapid expansion in the number of voluntary and proprietary agencies engaged in supplying home-care services to LTCI beneficiaries indicates that where the existing capacity

Cookies in xml document using a pdf file refers to open and there are referenced, check the scope.. Components in many of document as being a million developers have come into a

No major differences in the FTIR spectra were observed; nevertheless the water barrier properties were improved for film-3 which contains more carvacrol, a hydrophobic agent, as it