Extensible Representations (XML)

Traditional Internet Applications

4.26 Extensible Representations (XML)

The traditional application protocols covered in this chapter each employ a fixed representation. That is, the application protocol specifies an exact set of messages that a client and server can exchange as well as the exact form of data that accompanies the message. The chief disadvantage of a fixed approach arises from the difficulty involved in making changes. For example, because email standards restrict message content to text, a major change was needed to add MIME extensions.

The alternative to a fixed representation is an extensible system that allows a sender to specify the format of data. One standard for extensible representation has be-come widely accepted: the Extensible Markup Language (XML). XML resembles HTML in the sense that both languages embed tags into a text document. Unlike HTML, the tags in XML are not specified a priori and do not correspond to formatting commands. Instead, XML describes the structure of data and provides names for each field. Tags in XML are well-balanced — each occurrence of a tag <X> must be fol-lowed by an occurrence of </X>. Furthermore, because XML does not assign any meaning to tags, tag names can be created as needed. In particular, tag names can be selected to make data easy to parse or access. For example, if two companies agree to exchange corporate telephone directories, they can define an XML format that has data items such as an employee’s name, phone number, and office. The companies can choose to further divide a name into a last name and a first name. Figure 4.18 contains an example.

<NAME>

<LAST> Public </LAST>

</NAME>

</ADDRESS>

Figure 4.18 An example of XML for a corporate phone book.

Sec. 4.27 Summary 79

4.27 Summary

Application-layer protocols, required for standardized services, define data representation and data transfer aspects of communication. Representation protocols used with the World Wide Web include HyperText Markup Language (HTML) and the URL standard. The web transfer protocol, which is known as the HyperText Transfer Protocol (HTTP), specifies how a browser communicates with a web server to down-load or updown-load contents. To speed downdown-loads, a browser caches page content and uses an HTTP HEAD command to request status information about the page. If the cached version remains current, the browser uses the cached version; otherwise, the browser is-sues a GET request to download a fresh copy.

HTTP uses textual messages. Each response from a server begins with a header that describes the response. Lines in the header begin with a numeric value, represented as ASCII digits, that tells the status (e.g., whether a request is in error). Data that fol-lows the header can contain arbitrary binary values.

The File Transfer Protocol (FTP) is frequently used for file download. FTP re-quires a client to log into the server’s system; FTP supports a login of anonymous and password guest for public file access. The most interesting aspect of FTP arises from its unusual use of connections. A client establishes a control connection that is used to send a series of commands. Whenever a server needs to send data (e.g., a file download or the listing of a directory), the server acts as a client and the client acts as a server.

That is, the server initiates a new data connection to the client. Once a single file has been sent, the data connection is closed.

Three types of application-layer protocols are used with electronic mail: transfer, representation, and access. The Simple Mail Transfer Protocol serves as the key transfer standard; SMTP can only transfer a textual message. There are two representa-tion standards for email: RFC 2822 defines the mail message format to be a header and body separated by a blank line. The Multi-purpose Internet Mail Extensions (MIME) standard defines a mechanism to send binary files as attachments to an email message.

MIME inserts extra header lines that tell the receiver how to interpret the message.

MIME requires a sender to encode a file as printable text.

Email access protocols, such as POP3 and IMAP, permit a user to access a mail-box. Access has become popular because a subscriber can allow an ISP to run an email server and maintain the user’s mailbox.

The Domain Name System (DNS) provides automated mapping from human-readable names to computer addresses. DNS consists of many servers that each control one part of the namespace. Servers are arranged in a hierarchy, and a server knows the locations of servers in the hierarchy.

The DNS uses caching to maintain efficiency; when an authoritative server pro-vides an answer, each server that transfers the answer also places a copy in its cache.

To prevent cached copies from becoming stale, the authority for a name specifies how long the name can be cached.

80 Traditional Internet Applications Chap. 4

EXERCISES

4.1 What details does an application protocol specify?

4.2 Why is a protocol for a standardized service documented independent of an implementa-tion?

4.3 What are the two key aspects of application protocols, and what does each include?

4.4 Give examples of web protocols that illustrate each of the two aspects of an application protocol.

4.5 Summarize the characteristics of HTML.

4.6 What are the four parts of a URL, and what punctuation is used to separate the parts?

4.7 What are the four HTTP request types, and when is each used?

4.8 How does a browser know whether an HTTP request is syntactically incorrect or whether the referenced item does not exist?

4.9 What does a browser cache, and why is caching used?

4.10 Describe the steps a browser takes to determine whether to use an item from its cache.

4.11 Can a browser use transfer protocols other than HTTP? Explain.

4.12 When a user requests an FTP directory listing, how many TCP connections are formed?

Explain.

4.13 True or false: when a user runs an FTP application, the application acts as both a client and server. Explain your answer.

4.14 How does an FTP server know the port number to use for a data connection?

4.15 According to the original email paradigm, could a user receive email if the user’s computer did not run an email server? Explain.

4.16 List the three types of protocols used with email, and describe each.

4.17 What are the characteristics of SMTP?

4.18 Can SMTP transfer an email message that contains a period on a line by itself? Why or why not?

4.19 Where is an email access protocol used?

4.20 What are the two main email access protocols?

4.21 Why was MIME invented?

4.22 What is the overall purpose of the Domain Name System?

4.23 Assuming ISO has assigned N country codes, how many top-level domains exist?

4.24 True or false: a web server must have a domain name that begins with www. Explain.

4.25 True or false: a multi-national company can choose to divide its domain name hierarchy in such a way that the company has a domain name server in Europe, one in Asia, and one in North America.

4.26 When does a domain name server send a request to an authoritative server and when does it answer the request without sending to the authoritative server?

4.27 True or false: a DNS server can return a different IP address for a given name, depending on whether the lookup specifies email or web service. Explain.

Exercises 81

4.28 Does the IDNA standard require changes in DNS servers? in DNS clients? Explain.

4.29 Search the web to find out about iterative DNS lookup. Under what circumstances is itera-tive lookup used?

4.30 How does XML allow an application to specify fields such as a name and address?

Traditional Internet Applications

4.26 Extensible Representations (XML)

4.27 Summary

This page intentionally left blank

PART II

Data Communications

The basics of media, encoding,