Page 1
Chapter 3.10: Application Protocols
Evolution of the WWW
World Wide Web (WWW)
Access to linked documents, which are distributed over several computers in the Internet
History of the WWW
• Origin: 1989 in the nuclear research laboratory CERN in Switzerland. • Developed to exchange data, figures, etc. between a large number of
geographically distributed project partners via Internet. • First text-based version in 1990.
• First graphic interface (Mosaic) in February 1993, developed on to Netscape, Internet Explorer…
• Standardization by the WWW consortium (http://www.w3.org).
Page 2
Chapter 3.10: Application Protocols
The Client/Server model is used: Client (a Browser)
• Presents the actually loaded WWW page
• Permits navigating in the network (e.g. through clicking on a hyperlink) • Offers a number of additional functions (e.g. external viewer or helper
applications).
• Usually, a browser can also be used also for other services (e.g. FTP, e-mail, news,…).
Server
• Process which manages WWW pages.
• Is addressed by the client e.g. through indication of an URL (Uniform Resource Locator = logical address of a web page). The server sends the requested page (or file) back to the client.
Communication in the WWW
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Page 3
Chapter 3.10: Application Protocols
WWW, HTML, URL and HTTP
• WWWstands for World Wide Weband means the world-wide cross-linking of information and documents.
• The standard protocol used between a web server and a web client is the
HyperText Transfer Protocol (HTTP). – uses the TCP port 80
– defines the allowed requests and responses – is an ASCII protocol
• Each web page is addressed by a unique URL (Uniform Resource Locator)
(e.g. http://www-i4.informatik.rwth-aachen.de/education/tcpip). • The standard language for web documents is the HyperText Markup
Language (HTML).
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Page 4
Chapter 3.10: Application Protocols
HTTP - Message Format
GET http://server.name/path/file.type command URL protocol HTTP server domain namepath name file name
GEThttp:// www.informatik.rwth-aachen.de / info / general.html Instructions on a URL are
• GET: Load a web page
• HEAD: Load only the header of a web page • PUT: Store a web page on the server
Page 5
Chapter 3.10: Application Protocols
Loading of Web Pages
PC Browser
TCP/IP network
DNS answers
Browser asks DNS for the IP address of the server
Browser sends the command GET /info/general.html Browser opens a TCP connection to port 80 of the computer
WWW server sends back the file general.html Connection is terminated
DNS server
WWW server
Page 6
Chapter 3.10: Application Protocols
Example: Call of the URLhttp://www.informatik.rwth-aachen.de/material/general.html 1. The Browser determines the URL (which was clicked or typed).
2. The Browser asks the DNS for the IP address of the server www.informatik.rwth-aachen.de.
3. DNS answers with 137.226.116.241.
4. The browser opens a TCP connection to port 80 of the computer 137.226.116.241 5. Afterwards, the browser sends the commandGET /material/general.html
6. The WWW server sends back the filegeneral.html. 7. The connection is terminated.
8. The browser analyzes the WWW pagegeneral.html and presents the text. 9. If necessary, each picture is reloaded over a new connection to the server
(The address is included in the pagegeneral.htmlin form of an URL).
Note!
Step 9 applies only to HTTP/1.0! With the newer version HTTP/1.1 all referenced pictures are loaded before the connection termination (more efficiently for pages with many pictures).
Loading of Web Pages
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Usual URLs
The main application are web pages, but URLs are usable for other types of documents also: FTP://FTP.cs.vu.nl/pub/minix/README FTP FTP http://www.cs.vu.nl/~ast Hypertext (HTML) http file:///usr/suzanne/prog.c Local File file news:comp.os.minix Newsgroup news news:AA0134223112cs.utah.edu News Article news gopher://gopher.tc.umn.edu/11/Libraries Gopher Gopher telnet://www.w3.org:80 Remote login telnet mailto:[email protected] E-Mail mailto Example Used for… URL name
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
method sp URL sp version cr lf header field name : value cr lf header field name : value cr lf
header field name : calue cr lf :
:
cr lf
Data
Request line: necessary part, e.g. GET server.name/path/file.type
Header lines: optionally, further information to the host/document, e.g. Accept-language: fr
Entity Body: optionally. Further data, if the Client transmits data (POST method)
HTTP Request Header
sp: space
Page 9
Chapter 3.10: Application Protocols
version sp status code sp phrase cr lf header field name : value cr lf header field name : value cr lf
header field name : value cr lf :
:
cr lf
Data
Status LINE: status code and phrase indicate the result of an inquiry and an associated message, e.g.
200 OK
400 Bad Request 404 Not Found
Groups of status messages: 1xx: Only for information 2xx: Successful inquiry
3xx: Further activities are necessary 4xx: Client error (syntax)
5xx: Server error Entity Body: inquired data
HTTP Response Header
HEAD method: the server answers, but
does not transmit the inquired data (debugging)
Page 10
Chapter 3.10: Application Protocols
Server e.g. HTTP
Proxy Server
Caching of WWW pages
• A proxy temporarily stores the pages loaded by browsers. If a page is requested by a browser which already is in the cache, the proxy controls whether the page has changed since storing it. If not, the page can be passed back from the cache. If yes, the page is normally loaded from the server and again stored in the cache, replacing the old version.
Support when using additional protocols
• A browser enables also access to FTP, News, Gopher or telnet servers etc. • Instead of implementing all protocols in the browser, it can be realized the proxy.
The proxy then “speaks” HTTP with the browser and e.g. FTP with a FTP server. Integration into a Firewall
• The proxy can deny the access to certain web pages (e.g. inside companies). HTTP
Browser
Proxy
Server Internet
A Proxy is an intermediate entity used by several browsers. It takes over tasks of the browsers (complexity) and servers for more efficient page loading!
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Page 11
Chapter 3.10: Application Protocols
Control Process Betriebssystem FTP-Client FTP-Server Control Process Betriebssystem Data Transfer Data Transfer TCP/IP Netz Kontroll-verbindung Daten-verbindung Port A
Port B Port 21 Port 20
• FTP is the Internet standard for the transmission of files
• FTP is used to copy a complete filefrom one computer to another • FTP offers different possibilities apart from the pure file transfer:
– Interactive accessby the user (e.g. change of directories) – Format specification(binary or text files, ASCII or EBCDIC code) – Authentication(login name and password)
• Structure:
FTP - File Transfer Protocol
FTP Client FTP Server
Operating System Operating System
TCP/IP Network Control Connection Data Connection
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Page 12
Chapter 3.10: Application Protocols
A B
FTP Client Connection setup TCP (port 21) FTP server FTP connect to the server
Login Login OK Password Password OK User logged in GET file
Connection setup TCP (port 20) Data exchange
Page 13
Chapter 3.10: Application Protocols
Course of an FTP Session
1. The FTP client selects a random port number A for itself.
2. It contacts the master control process of the FTP server onport 21(for FTP control connections). The login name and password are being queried.
3. The FTP server provides a slave control process for the control connection between client Port A and server port 21.
4. Over the control connection, the FTP client can send commands (e.g. folder directory, showing of directory contents, transfer of a file).
5. If the FTP client requests a file, then it first selects a random port B, sets up a data transfer process and tells the server the port number by using the control connection. 6. The FTP server sets up a data transfer process with localport 20(standard port for FTP
data connections), which accepts only connections of port B of the FTP client (by this it is ensured that the transfer is made to the correct process – no further authentication is necessary).
7. A connection between server port 20 and client port B is established, the data are being transferred.
8. Afterwards, the connection is terminated and both data transfer processes terminate as well (i.e. each transfer needs new processes and a new connection).
Page 14
Chapter 3.10: Application Protocols
Command Effect open disconnect user cd lcd pwd get/mget put/mput binary ascii
Connect to the FTP server Terminate the FTP session
Send user information after connecting Change directory on the remote computer Change directory on the own computer
Show the path to the directory on the remote computer The client receives a (resp. several) document The client sends a (resp. several) document Set the transmission mode to binary Set the transmission mode to ASCII dir/ls List contents of the remote directory
help Help for commands
delete
Terminate the FTP session, abort bye
Delete a remote file
FTP - Commands
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
1yz Tentative positive response: the action was started, but the client must be waiting for another response.
The inquiry was completely worked on.
Positive intermediate response: the command was accepted, but a further command is expected.
Temporary negative response: the request was not worked on, but the reason for the fault is only temporarily, later on a repetition can take place. Durably negative response: the command was not accepted and should not be repeated. 2yz 3yz 4yz 5yz Syntax error Information
The message refers to the connection Answers to login commands
No fixed usage
Status of the file system Response Effect x0z x1z x2z x3z x4z x5z
FTP - Responses
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
TFTP - Trivial File Transfer Protocol
• TFTP is a very simpleprotocol for file transfer
• Communication runs over port 69 and uses UDP, not TCP • TFTP does not have authentication
• TFTP always uses 512-byte blocks
UDP header
IP header Data (350 bytes)
UDP header
IP header Data (512 bytes)
UDP header
IP header ACK
Timeout UDP header
IP header Data (512 bytes)
UDP header
IP header ACK
UDP header IP header
UDP header
IP header GET Path/file.name
Data (512 bytes)
A
TFTP Client
Page 17
Chapter 3.10: Application Protocols
Early systems
A simple file transmission took place, with the convention that the first line contains the address of the receiver of the file.
Problems
E-Mail to groups, structuring of the e-mail, delegation of the administration to a secretary, file editor as user interface, no mixed media
Solution
X.400 as standard for e-mail transfer. This specification was however too complex and badly designed. Generally accepted only became a simpler system, cobbled together “by a handful of computer science students”:
theSimple Mail Transfer Protocol (SMTP).
Electronic Mail: E-Mail
Page 18
Chapter 3.10: Application Protocols
Electronic Mail: E-Mail
Internet Message
Transfer Agent User
AgentAgentUserUser AgentAgentUser
An e-mail system generally consists of two subsystems: • User Agent(UA, normal e-mail program)
Usually runs on the computer of the user and helps during the processing of e-mails
Creation of new and answering of old e-mail
Receipt and presentation of e-mail
Administration of received e-mail
• Message Transfer Agent(MTA, e-mail server)
Usually runs in the background (around the clock)
Delivery of e-mail which is sent by User Agents
Intermediate storage of messages for users or other Message Transfer Agents
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Page 19
Chapter 3.10: Application Protocols
Structure of an E-Mail
For sending an e-mail, the following information is needed from the user: • Message (usually normal text + attachments, e.g. word file, GIF image…) • Destination address (in general in the form mailbox@location,
e.g. [email protected]) • Possibly additional parameters concerning e.g. priority or security E-Mail formats: two used standards
• RFC 822
• MIME (Multipurpose Internet Mail Extensions) WithRFC 822an e-mail consists of
• a simple “envelope” (created by the Message Transfer Agent based on the data in the e-mail header),
• a set of header fields (each one line ASCII text), • a blank line, and
• the actual message (Message Body).
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Page 20
Chapter 3.10: Application Protocols
E-Mail Header
Header Meaning
To: Address of the main receiver (possibly several receivers or also a mailing list) Cc: Carbon copy, e-mail addresses of less important receivers
Bcc: Blind carbon copy, a receiver which is not indicated to the other receivers From: Person who wrote the message
Sender: Address of the actual sender of the message (possibly different to “From” person) Received: One entry per Message Transfer Agent on the path to the receiver
Return Path: Path back to the sender (usually only e-mail address of the sender) Date: Transmission date and time
Reply to: E-Mail address to which answers are to be addressed Message-Id: Clear identification number of the e-mail (for later references) In-Reply-to: Message-Id of the message to which the answer is directed References: Other relevant Message-Ids
Page 21
Chapter 3.10: Application Protocols
E-Mail Header
RFC 822: only suitably for messages of pure ASCII text without special characters. Nowadays demanded additionally:
• E-Mail in languages with special characters (e.g. French or German) • E-Mail in languages not using the Latin alphabet (e.g. Russian) • E-Mail in languages not at all using an alphabet (e.g. Japanese) • E-Mail not completely consisting of pure text (e.g. audio or video)
MIMEkeeps the RFC-822 format, but additionally defines
• a structure in the Message Body (by using additional headers), and • coding rules for non-ASCII characters.
Header Meaning
MIME-Version: Used version of MIME is marked
Content-Description: String which describes the contents of the message Content-Id: Clear identifier for the contents
Content-Transfer-Encoding:
Coding which was selected for the contents of the email (some networks understand e.g. only ASCII characters). Examples: base64, quoted-printable
Content-Type: Type/Subtype regarding RFC 1521, e.g. text/plain, image/jpeg, multi-part/mixed
Page 22
Chapter 3.10: Application Protocols
MIME
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED;
BOUNDARY= "8323328-2120168431-824156555=:325" --8323328-2120168431-824156555=:325
Content-Type: TEXT/PLAIN; charset=US-ASCII
A picture is in the appendix
--8323328-2120168431-824156555=:325
Content-Type: IMAGE/JPEG; name="picture.jpg"
Content-Transfer-Encoding: BASE64 Content-ID: <PINE.LNX.3.91.960212212235.325B@localhost> Content-Description: /9j/4AAQSkZJRgABAQEAlgCWAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQIBAQEBA QIBAQECAgICAgICAgIDAwQDAwMDAwICAwQDAwQEBAQEAgMFBQQEBQQEBAT/ 2wBDAQEBAQEBAQIBAQIEAwIDBAQEBA […] KKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAoooo AKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiig AooooAD//Z ---8323328-2120168431-824156555=:325 —
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
E-Mail over POP3 and SMTP
Simple Mail Transfer Protocol (SMTP)
– Sending e-mails over a TCP connection (port 25) – SMTP is a simple ASCII protocol
– Without checksums, without encryption
– Receiving machine is the server and begins with the communication – If the server is ready for receiving, it signals this to the client. This sends
the information from whom the e-mail comes and who the receiver is. If the receiver is known to the server, the client sends the message, the server confirms the receipt.
Post Office Protocol version 3 (POP3)
– Get e-mails from the server over a TCP connection, port 110
– Commands for logging in and out, message download, deleting messages on the server (maybe without transferring them to the client)
– Only copies e-mails of the remote server to the local system
Internet Message
Transfer Agent User
AgentAgentUserAgentUserAgentUser
Internet Message
Transfer Agent User
AgentAgentUserAgentUserUser Agent
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
E-Mail over POP3 and SMTP
• User 1: writes an e-mail
• Client 1 (UA 1): formats the e-mail, produces the receiver list, and sends the e-mail to its mail server (MTA 1)
• Server 1 (MTA 1): Sets up a connection to the SMTP server (MTA 2) of the receiver and sends a copy of the e-mail
• Server 2 (MTA 2): Produces the header of the e-mail and places the e-mail into the appropriate mailbox • Client 2 (UA 2): sets up a connection to the mail
server and authenticates itself with username and password (unencrypted!)
• Server (MTA 2): sends the e-mail to the client • Client 2 (UA 2): formats the e-mail
• User 2: reads the e-mail
Page 25
Chapter 3.10: Application Protocols
SMTP - Command Sequence
Communication between partners (from abc.com to beta.edu) in text form of the following kind:
C: HELO <abc.com> /* Identification of the sender/* S: 250 <beta.edu>OK /* Server announces itself */
C: MAIL FROM:<[email protected]> /* Sender of the e-mail */
S: 250OK
/* Receiver of the e-mail */
C: RCPT TO:<[email protected]> S: 250OK
C: DATA /* The data are following */
S: 354Start mail inputs; end with “<crlf>.<crlf>” on a line by itself
C: From: Krogull@…. <crlf>.<crlf> /* Transfer of the whole e-mail, including all headers */
S: 220 <beta.edu>Service Ready
S: 250OK
C: QUIT
S: 221 <beta.edu>Server Closing
/* Terminating the connection */ /* Receiver is ready/*
S = server, receiving MTA / C = Client, sending MTA
/* Sending is permitted */ /* Receiver known */
Page 26
Chapter 3.10: Application Protocols
PC Client (UA) POP3 Server (MTA) TCP/IP network Commands Replies TCP connection port 110 Greetings
POP3
Get e-mails from the server by means of POP3:
• Authorizing phase: USER name PASS string • Transaction phase: STAT LIST [msg] RETR msg DELE msg NOOP RSET QUIT
Minimal protocol with only two command types: • Copy e-mails to the local computer
• Delete e-mails from the server
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Page 27
Chapter 3.10: Application Protocols
POP3 Protocol
Authorizing phase • useridentifies the user • passis its password
• +OKor-ERRare possible server answers
Transaction phase
• listfor the listing of the message numbers and the message sizes
• retrto requesting a message by its number
• deledeletes the appropriate message
S: +OK POP3 server ready C: user alice
S: +OK
C: pass hungry
S: +OK user successfully logged in C: list S: 1 498 S: 2 912 S: . C: retr 1 S: <message 1 contents> S: . C: dele 1 C: retr 2 S: <message 2 contents> S: . C: dele 2 C: quit S: +OK
Lehrstuhl für Informatik 4
Kommunikation und verteilte Systeme
Page 28
Chapter 3.10: Application Protocols
IMAP as POP3 “Variant “
Meanwhile also many operators of web pages offer email services: gmx, web.de, yahoo,…
Here finally again HTTP serves as protocol for the access to the e-mails. The management is similar as with IMAP, only that the client is integrated into the web server.
Enhancement of POP3:IMAP (Interactive Mail Access Protocol)
• TCP connection over port 143
• E-Mails are not downloaded and stored locally, but remain on the server
• The client performs all actions remotely. This is suitable for users who need access to their e-mails from different hosts