Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
9. XML: Defining Data for Web Applications
1. Define Markup Language.
“A Markup is a set of instructions are also known as tags which can be added to text
files.”
Ex: Microsoft Rich Text Format (RTF), Adobe Portable Document Format (PDF), HTML
etc.,
2. What is XML? Explain.
XML stands for eXtensible Markup Language
• XML is a markup language much like HTML
• XML was designed to carry data, not to display data
• XML tags are not predefined. We must define our own tags
• XML is designed to be self-descriptive
• XML is a W3C Recommendation
Background for XML:
• An Extensible Markup Language (XML) document describes the structure of data
• XML and HTML have a similar syntax. Both derived from SGML (Standard
Generalized Markup Language)
• XML has no mechanism to specify the format for presenting data to the user
• An XML document resides in its own file with an ‘.xml’ extension
The Basic Rules (XML Syntax Rules):
• XML is case sensitive
• All start tags must have end tags
• Elements must be properly nested
• XML declaration is the first statement
• Every document must contain a root element
• Attribute values must have quotation marks
• Certain characters are reserved for parsing
Some characters have a special meaning in XML. There are 5 predefined entity
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist., If we place a character like "<" inside an XML element, it will generate an error
because the parser interprets it as the start of a new element. This will generate an XML
error:
<message>if salary < 1000 then</message>
To avoid this error, replace the "<" character with an entity reference:
<message>if salary < 1000 then</message>
The Difference Between XML and HTML:
XML is not a replacement for HTML. HTML is about displaying information, while
XML is about carrying information. XML and HTML were designed with different goals:
• XML was designed to transport and store data, with focus on what data is
• HTML was designed to display data, with focus on how data looks
The general format (Syntax) of XML document is
<root>
<child>
<subchild> . . . </subchild>
</child>
</root>
Ex:1 XML documents use a self-describing and simple syntax:
<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
Output:
The first line is the XML declaration. It defines the XML version (1.0).
The next line describes the root element of the document (like saying: "this document
is a note").
The next 4 lines describe 4 child elements of the root (to, from, heading, and body)
And finally the last line defines the end of the root element
Control Information:
There are three control structures. They are
Comments
Processing Instructions
Document type declarations
Comments in XML:
The syntax for writing comments in XML is similar to that of HTML.
<!-- This is a comment -->
Processing Instructions:
Processing Instructions are (PI) are used to control applications. For example,
<?xml version=”1.0”>
The above instruction tells the data in the file follows the rules of XML version 1.0.
Document Types Declarations:
Each XML document has an associated Document Type Definition. The DTD usually
held in a separate file and can be used with many documents.
Ex:
<!DOCTYPE Recipes SYSTEM “recipe.dtd”>
This declaration tells the parser that the XML file is of type Recipes and that uses a
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist., Any DTD which we develop ourselves or have developed for us is denoted by the keyword
SYSTEM.
White-space is Preserved in XML:
HTML truncates multiple white-space characters to one single white-space:
HTML: Hello Tove
Output: Hello Tove
With XML, the white-space in a document is not truncated.
XML Stores New Line as LF:
In Windows applications, a new line is normally stored as a pair of characters:
carriage return (CR) and line feed (LF). In UNIX applications, a new line is normally stored
as an LF character. Macintosh applications also use an LF to store a new line. XML stores a
new line as LF.
XML Elements:
An XML document contains XML Elements. An XML element is everything from
(including) the element's start tag to (including) the element's end tag. An element can
contain:
other elements
text
attributes
or a mix of all of the above...
3. What is DTD? Explain.
“The Document Type Definition (DTD) describes a model of the structure of the
content of an XML document.”
DTD Elements:
In a DTD, elements are declared with an ELEMENT declaration. The syntax is
<!ELEMENT element-name category>
(or)
<!ELEMENT element-name (element-content)>
The purpose of a DTD is to define the legal building blocks of an XML document. A
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
An external DTD subset
An internal DTD subset
An external DTD subset is a DTD subset is a DTD that exists outside the content of
the document. An internal DTD subset is a DTD that is included within the XML document.
A document can contain any one or both types of subsets. If a document consists both types
of subsets, the internal subset is process is processed first and then external subset is
processed.
An Internal DTD subset example:
Open a new file in Notepad and type the following code:
<?xml version="1.0" ?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Save the above file with .xml extension (For example, 4.xml) & Open it in a browser.
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
An external DTD subset example:
Open a new file in Notepad and type the following code:
<!ELEMENT university (college*)>
<!ELEMENT college (name,dept*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT dept (mpc,mpcs,mecs,mscs,bcom)>
<!ELEMENT mpc (#PCDATA)>
<!ELEMENT mpcs (#PCDATA)>
<!ELEMENT mecs (#PCDATA)>
<!ELEMENT mscs (#PCDATA)>
<!ELEMENT bcom (#PCDATA)>
Save the above file as one.dtd.
Open a new file in Notepad and type the following code:
<?xml version="1.0" ?>
<!DOCTYPE university SYSTEM "one.dtd">
<university>
<college>
<name>bvrice</name>
<dept>
<mpc>Phy_Che</mpc>
<mpcs>Phy_CSC</mpcs>
<mecs>Ele_CSC</mecs>
<mscs>Stat_CSC</mscs>
<bcom>Com_CSC</bcom>
</dept>
</college>
</university>
Save the above file with .xml extension (For example, index.xml).
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
Output:
Explanation:
The DOCTYPE statement is the document type declaration.
The square brackets [ ] describes the DTD and define the rules of the document.
<!ELEMENT university (college*)> The symbol * indicates that it can contain zero
or more of the college elements.
An element which suffixed with the symbol ? is known as that element is optional.
#PCDATA specifies Parsed Character Data . The reserved character # indicates that
#PCDATA is a reserved word.
Structure Symbols:
XML uses a set of symbols for specifying the structure of an element declaration.
These symbols are also known as control characters. These are explained in the following
table:
Symbol Example Meaning
Asterix item* The item appears zero or more times.
Comma (item1, item2, item3) Separates items in a sequence in the order in
which they appear.
None item Item appears exactly once.
Parenthesis (item1, item2) Enclose a group of items
Pipe (item1 | item2) Separates a set of alternatives. Only one may
appear.
Plus item+ Item appears at least once.
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
Attributes:
In a DTD, attributes are declared with an ATTLIST declaration. The syntax is
<!ATTLIST element-name attribute-name attribute-type attribute-value>
Ex:
DTD Valid XML
<!ATTLIST payment type CDATA "check"> <payment type="check" />
The attribute-type can be one of the following:
Type Description
CDATA The value is character data
(en1|en2|..) The value must be one from an enumerated list
ID The value is a unique id
IDREF The value is the id of another element
IDREFS The value is a list of other ids
NMTOKEN The value is a valid XML name
NMTOKENS The value is a list of valid XML names
ENTITY The value is an entity
ENTITIES The value is a list of entities
NOTATION The value is a name of a notation
xml: The value is a predefined xml value
The attribute-value can be one of the following:
Value Explanation
value The default value of the attribute
#REQUIRED The attribute is required
#IMPLIED The attribute is not required
#FIXED value The attribute value is fixed
Entities:
Entities are variables used to define shortcuts to standard text or special characters.
Entity references are references to entities. Entities can be declared internal or external.
Internal Entities:
These are used to create small pieces of data which we want to use repeatedly
throughout our schema. The syntax is
<!ENTITY entity-name "entity-value">
In XML, an entity has three parts: an ampersand (&), an entity name, and a
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
Ex:
DTD Valid XML
<!ENTITY writer "KBR.">
<!ENTITY copyright "Copyright WT_ Notes.">
<author>&writer;©right;</author>
External Entities:
Almost anything which is data can be included in our XML as an external entity. The syntax is
<!ENTITY entity-name SYSTEM "URI/URL">
Ex::
DTD <!ENTITY writer SYSTEM "http://www.w3schools.com/entities.dtd"> <!ENTITY copyright SYSTEM "http://www.w3schools.com/entities.dtd">
Valid XML <author>&writer;©right;</author>
Namespaces:
A namespace is a way of keeping the names used by applications separate from each
other. Within a particular namespace no duplication of names can exist. The purpose of XML
Namespaces is to distinguish between duplicate elements and attribute names.
The following example explains there will be no conflict because the two <table>
elements have different names:
<h:table> <h:tr>
<h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr>
</h:table> <f:table>
<f:name>African Coffee Table</f:name> <f:width>80</f:width>
<f:length>120</f:length> </f:table>
XML developers can specify their own namespaces which can be used in many
applications. A namespace is included in the XML in the same way as a DTD.
Ex:
<?xml version="1.0" ?>
<!DOCTYPE Recipes SYSTEM "recipes.dtd">
<!xml:namespace ns="http://URL/namespaces/breads" prefix="bread"> <!xml:namespace ns="http://URL/namespaces/meats" prefix="lamb"> <recipes>
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist., <bread:name>Basic Loaf</bread:name>
</category> <category>
<lamb:name>Roast Lamb</lamb:name> </category>
</recipes>
In the above example, each category of recipe has a name element and there is no
confusion because the namespaces have been declared.
4. Write about XML Schema.
An XML Schema describes the structure of an XML document. XML Schema is
an XML-based alternative to DTD. The XML Schema language is also referred to as XML
Schema Definition (XSD).
The purpose of an XML Schema is to define the legal building blocks of an XML
document, just like a DTD.
An XML Schema:
• defines elements that can appear in a document
• defines attributes that can appear in a document
• defines which elements are child elements
• defines the order of child elements
• defines the number of child elements
• defines whether an element is empty or can include text
• defines data types for elements and attributes
• defines default and fixed values for elements and attributes
XML Schema Data Types:
XML Schema data types can be generally categorized a "simple type" (including
embedded simple type) and "complex type."
Simple Type
A simple type is a type that only contains text data when expressed according
to XML 1.0. User can independently define. This type is used when a
restriction is placed on an embedded simple type to create and use a new
type.
Ex: <xsd:element name="Department" type="xsd:string" />
Here, the section described together with "xsd:string" is an embedded
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist., the definition that the data type for the element called "Department" is a text
string.
Complex Type
A complex data type is a type that has a child element or attributes structure
when expressed according to XML 1.0. User can independently define. This
type is used when the type has a child element or attribute.
Ex: <xsd:complexType name="EmployeeType">
<xsd:sequence maxOccurs="unbounded">
<xsd:element ref="Name" />
<xsd:element ref="Department" />
</xsd:sequence>
</xsd:complexType>
<xsd:element name="Name" type="xsd:string" />
<xsd:element name="Department" type="xsd:string" />
In this case the type name "EmployeeType" is designated by the
name attribute of the complexType element.
Ex: An XML Schema Document (XSD file)
Open a new file in Notepad and type the following code:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="product" type="productType"/>
<xsd:complexType name="productType">
<xsd:sequence>
<xsd:element name="number" type="xsd:integer"/>
<xsd:element name="date" type="xsd:date"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
An XML schema instance (XML file)
Open a new file in Notepad and type the following code:
<?xml version="1.0"?>
<product xmlns="http://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3schools.com productType.xsd">
<number>rama</number>
<date>2004-05-25</date>
</product>
Save the above program as “product.xml” in the same folder where we saved
“productType.xsd”.
Open the “product.xml” in the browser. If there is no errors, the output will be
displayed as the following:
4. Write about DOM and SAX.
Parsers:
A program that analyses the grammatical structure of an input, with respect to
a given formal grammar. (OR) An XML parser is a software component that
can read and validate any XML document.
The parser determines how a sentence can be constructed from the grammar of
the language by describing the atomic elements of the input and the
relationship among them.
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
SAX (Simple API for XML):
a. The SAX API provides a serial mechanism for accessing XML documents.
b. The SAX model allows for simple parsers by allowing parsers to read
through a document in a linear way, and then to call an event handler
every time a markup event occurs.
c. When a parsing event happens, the parser invokes the corresponding
method of the corresponding handler.
d. The handlers are programmer’s implementation of standard Java API (i.e.,
interfaces and classes).
e. Similar to an I/O-Stream, goes in one direction.
Structure of SAX Parser:
DOM (Document Object Model):
a. In the Sun's implementation of DOM model, the parser will read in an entire
XML data source and construct a treelike representation of it in memory.
b. Under DOM, a pointer to the entire document is returned to the calling
application.
c. The application can then manipulate the document, rearranging nodes,
adding and deleting content as needed by using DOM API.
d. While DOM is generally easier to implement, it is far slower and more
resource intensive.
e. DOM can be used effectively with smaller XML data structures in
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
Using the DOM API:
The following diagrams explain DOM views XML documents as trees. But this is
very much a logic view of the document.
There is no requirement that parsers include a tree as a data structure. Each node of
the tree represents an XML element is modeled as an object.
Differences between DOM and SAX parser:
SAX DOM
Both SAX and DOM are used to parse the XML document. Both have advantages and
disadvantages and can be used in our programming depending on the situation.
Parses node by node Stores the entire XML document into memory before
processing
Doesn’t store the XML in
memory Occupies more memory
We can’t insert or delete a node We can insert or delete nodes
Top to bottom traversing Traverse in any direction.
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
SAX is a Simple API for XML Document Object Model (DOM) API
import javax.xml.parsers.*; import javax.xml.parsers.*;
import org.xml.sax.*; import org.w3c.dom.*;
import org.xml.sax.helpers.*;
doesn’t preserve comments preserves comments
SAX generally runs a little faster
than DOM SAX generally runs a little faster than DOM
If we need to find a node and doesn’t need to insert or delete we can go with SAX itself
otherwise DOM provided we have more memory.
5. How to work with XML Stylesheets (Presenting XML)?
Presenting XML:
The following program explains the presentation of XML.
Open a new file in Notepad and type the following code:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist., </html>
</xsl:template>
</xsl:stylesheet>
Save the above program as “pstyle.xsl”.
Open a new file in Notepad and type the following code:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="pstyle.xsl"?>
<catalog>
<cd>
<title>Windows 95</title>
<artist>MS Team</artist>
<country>USA</country>
<company>Microsoft</company>
<price>5000.89</price>
<year>1995</year>
</cd>
<cd>
<title>MS-Office</title>
<artist>MS Team</artist>
<country>US</country>
<company>Microsoft</company>
<price>300.50</price>
<year>2007</year>
</cd>
<cd>
<title>Ilaya Raja Hits</title>
<artist>Veturi</artist>
<country>India</country>
<company>Aditya Music</company>
<price>15.75</price>
<year>2009</year>
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist.,
Save the above file as “pxml.xml” in the same folder i.e., where we saved
“pstyle.xml” file.
Open the file “pxml.xml” in the browser and the output will be displayed as the
following:
Note:
If there is any error in the XSL file, the output will not be displayed.Explanation:
XSL stands for eXtensible Stylesheet Language, and is a style sheet language for XML documents.
XSLT is a language for transforming XML documents into XHTML documents or to
other XML documents.
What is XSLT?
• XSLT stands for XSL Transformations
• XSLT is the most important part of XSL
• XSLT transforms an XML document into another XML document
• XSLT uses XPath to navigate in XML documents
• XSLT is a W3C Recommendation
<xsl:stylesheet> and <xsl:transform> are completely synonymous and either can be
used. The correct way to declare an XSL style sheet according to the W3C XSLT
Recommendation is:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
(OR)
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
To get access to the XSLT elements, attributes and features we must declare the
XSLT namespace at the top of the document.
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist., define a template for the entire XML document. The value of the match attribute is an
XPath expression (i.e. match="/" defines the whole document).
The <xsl:value-of> element can be used to extract the value of an XML element and
add it to the output stream of the transformation.
The XSL <xsl:for-each> element can be used to select every XML element of a
specified node-set.
The value of the select attribute is an XPath expression. An XPath expression works
like navigating a file system; where a forward slash (/) selects subdirectories.
To make a link XSL file to XML, the following syntax is required:
<?xml-stylesheet type="text/xsl" href="pstyle.xsl"?>
Where “pstyle.xsl” is a XML Stylesheet.
6. Explain XSL elements.
The general format of XSL element format is
xsl:element select=value
The following table describes the list of XSL elements:
Element Description
apply-imports Applies a template rule from an imported style sheet
apply-templates Applies a template rule to the current element or to the current element's child nodes
Attribute Adds an attribute
attribute-set Defines a named set of attributes
call-template Calls a named template
Choose Used in conjunction with <when> and <otherwise> to express
multiple conditional tests
Comment Creates a comment node in the result tree
Copy Creates a copy of the current node (without child nodes and
attributes)
copy-of Creates a copy of the current node (with child nodes and attributes)
decimal-format Defines the characters and symbols to be used when converting
numbers into strings, with the format-number() function
Element Creates an element node in the output document
Fallback Specifies an alternate code to run if the processor does not support
an XSLT element
for-each Loops through each node in a specified node set
If Contains a template that will be applied only if a specified
Written by Dept. of Computer Science @ Dr. BVRICE Bhimavaram W.G. Dist., Import
Imports the contents of one style sheet into another.
Note: An imported style sheet has lower precedence than the
importing style sheet
Include
Includes the contents of one style sheet into another.
Note: An included style sheet has the same precedence as the
including style sheet
Key Declares a named key that can be used in the style sheet with the
key() function
Message Writes a message to the output (used to report errors)
namespace-alias Replaces a namespace in the style sheet to a different namespace
in the output
Number Determines the integer position of the current node and formats a
number
Otherwise Specifies a default action for the <choose> element
Output Defines the format of the output document
Param Declares a local or global parameter
preserve-space Defines the elements for which white space should be preserved
processing-instruction Writes a processing instruction to the output
Sort Sorts the output
strip-space Defines the elements for which white space should be removed
Stylesheet Defines the root element of a style sheet
Template Rules to apply when a specified node is matched
Text Writes literal text to the output
Transform Defines the root element of a style sheet
value-of Extracts the value of a selected node
Variable Declares a local or global variable
When Specifies an action for the <choose> element
with-param Defines the value of a parameter to be passed into a template