World Wide Web
Joao.Neves@fe.up.pt
Before WWW
Major search tools: Gopher and Archie
Archie
•
Search FTP archives indexes
•
Filename based queries
Gopher
•
Friendly interface
Web Born
Tim Berners-Lee et al. at CERN in 1991
HyperText Transfer Protocol (HTTP)
Hypertext - embedded links in text to link to
another text document
August 1995 - March 2008
Source http://news.netcraft.com/archives/web_server_survey.html João Neves 5Layering
HyperText Transfer Protocol Telnet Simple Network Management Dynamic Host Configuration Transmission Control Protocol (TCP)Internet Protocol (IP)
Ethernet Wi-Fi SONET
HTTP
Standard protocol for web transfer
Request-response interaction between client and server
The server has resources as HTML files and images
Request methods: GET, HEAD, PUT, POST, DELETE, …
Response: Status line + additional info (e.g., a web page)João Neves 7
Introduction to HTTP
It has been in use by the World-Wide Web global
information initiative since 1990
Its first version (referred to as HTTP/0.9) was a simple
protocol for raw data transfer across the Internet
HTTP/1.0 improved the protocol by allowing messages
to be in the format of MIME-like messages:
•
containing metainformation about the data
transferred and
•
modifiers on the request/response semantics
HTTP Transaction
Client
HTTP
Server
HTTP client: web browser
HTTP server: web server
Standard port: 80
Suggested alternate ports: 81, 8080, 8081
HTTP is used to transmit resources•
File/documents•
Image files•
Query results•
Outputs from CGI scripts•
Anything that can be identified by a URLWebRoot
dir
file.html
João Neves 9
Web Clients
Lynx 2.0 (1993, character based interface)
NCSA Mosaic (1993, first with graphical interface)
Marc Andreessen (author of Mosaic) moved to Netscape
Microsoft Internet Explorer (“new name for Mosaic…”)
Mozilla Firefox
Opera
Safari
The Browser
The browser
1. fetches the page requested
2. interprets the text and formatting commands that it contains
3. displays the page properly formatted on the screen
On the page strings of text that are links to other pages, called
hyperlinks
•
On the screen the hyperlinks are highlighted, either by underlining,
displaying them in a special color, or both
João Neves 11
Web Servers
NCSA HTTPd
non-commercial free
Apache HTTP Server
freeware
Apache Tomcat
freeware
lighttpd
freeware
Microsoft Internet Information
Services (IIS)
payware
Zeus Web Server
payware
Zope
freeware
...
Server Share
Server Share amongst the Million Busiest Sites, March 2009
Markup
“Markup” are codes inserted into texts documents
to manage formatting, printing or other process.
A description markup indicates the nature,
function, or content of the data in a file.
A procedural markup defines what processing is to
be carried out at particular points in the document.
João Neves 15
HyperText Markup Language
Language in which web pages are written
Contains formatting commands
Tells browser what to display and how to display Examples:
<TITLE> Welcome to My Great Site </TITLE>• The title of this page is “Welcome to My Great Site”
<B>Great News!</B>• Set “Great News!” in boldface
<A HREF=”http://www.xptoo.org/”>I’m the One</A>• A link pointing to the web page http:// www.xptoo.org/index.html with the text “I’m the One” displayed
Sample HTML Tags
<A> </A> Anchor link or name<BODY> </BODY> Document Contents
<BR> Break
<FORM> </FORM> Input form
<H1> </H1> Heading level 1
<HEAD> </HEAD> Header of a document
<HR> Horizontal Rule
<HTML> </HTML> The doc type is HTML
<LI> List Item
<OL> </OL> Ordered List
<P> </P> Paragraph break
<PRE> </PRE> Preformatted text
<TITLE> </TITLE> Document title
<UL> Unnumbered list
João Neves 17
Uniform Resource Identifiers
A URI is an identifier for some resource, and a Uniform Resource Locator (URL) gives you specific information as to obtain that resource
HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet systems,including those supported by the next protocols:
• SMTP, NNTP, FTP
In this way, HTTP allows basic hypermedia access to resources available from diverse applicationsUniform Resource Identifiers
The following examples illustrate URL that are in common use:
Name Utility Example
ftp ftp scheme for File Transfer Protocol services ftp://ftp.is.co.za/rfc/rfc1808.txt
http http scheme for Hypertext Transfer Protocol services http://www.math.uio.no/faq/compression-faq/part1.html
file Local file file:/usr/local/etc/ntp.conf
news news scheme for USENET news groups and articles news:comp.infosystems.www.servers.unix telnet telnet scheme for interactive services via the TELNET
Protocol
telnet://melvyl.ucop.edu/
mailto mailto scheme for electronic mail addresses mailto:mduerst@ifi.unizh.ch gopher gopher scheme for Gopher and Gopher+ Protocol
services
gopher://stap.umn.edu/00/Weather/Ca/Los%20Angeles
João Neves 19
Uniform Resource Locator
Some URL schemes use the format "user:password" in the userinfo field.
This practice is NOT RECOMMENDED, because the passing of authentication information in clear text (such as URI) has proven to be a security risk in almost every case where it has been used. [RFC2396]<scheme>: // [userinfo @] hostname [: port] / path [; parameters] [?query]
HyperText Transfer Protocol
A very simple, stateless protocol for sessionless
exchanges
• Browser creates a new connection each time it wants to make a new request (for a page, image, etc.)
Exceptions:
• HTTP 1.1 added support for persistent connections and
pipelining
• Clients + servers might keep state information
• Cookies provide a way of recording state
João Neves 21
The http protocol: more
http: TCP transport service
client initiates TCP connection(creates socket) to server, port 80
server accepts TCP connection from client
http messages (application-layer protocol messages) exchanged between browser (http client) and Web server (http server)
TCP connection closedhttp is “stateless”
server maintains noHTTP
GET /path/to/file/index.html HTTP/1.0
HTTP method
Path: the part of the URL after the hostname, i.e.
request URI
The HTTP version
João Neves 23
HTTP
Session
jneves@bart(1)$ telnet www.inescporto.pt 80 [...]
GET /~jneves/index.html HTTP/1.0 From: Joao.Neves@xptoo.org User-Agent: Camachina/5.0
HTTP/1.1 200 OK
Date: Tue, 26 May 2009 18:06:13 GMT
Server: Apache/2.30 (Unix) PHP/5.5 DAV/2 mod_perl/2.9 Perl/v5.20 Last-Modified: Fri, 04 May 2007 18:41:20 GMT
Accept-Ranges: bytes Content-Length: 91 Connection: close Content-Type: text/html <html> <head>
<meta HTTP-EQUIV="REFRESH" content="0; url=./index.shtml"> </head>
</html>
Connection closed by foreign host.
HTTP Request Headers
Header Description
From RFC822 E-mail address of the user
User-Agent Client Software
Accept File types that client will accept, e.g., text/plain, text/html
Accept-encoding Compression methods, e.g., x-compress; x-zip
Accept-Language Language(s) used
Referrer (optional) URL of the document (or element within the document) from which the URL in the request was obtained
If-Modified-Since Return document if modified since specified date
Content-length Length in octets of data to follow
Content-Type Type of the item
Pragma: no-cache Directive understood by a proxy server; When present the proxy should not return a document from the cache
João Neves 25
HTTP Response Headers
Header Description
Server Server Software
Date Current Date
Last-Modified Modification date of the document
Expires Document expiration date
Location The location of the document in
redirection responses
Pragma A hint, e.g. no cache
MIME-version
Link URL of document’s parent
HTTP Status Codes
Code Text 2xx Success 3xx Redirection 301 Moved 302 Found 4xx Client Errors 400 Bad Request 401 Unauthorized 404 Not found 5xx Server Errors 500 Internal Error 502 Service Overload João Neves 27HTTP over TLS
HTTP 1.1 Features
Persistent TCP Connections: remain open for
multiple requests
Partial Document Transfers: clients can specify start
and stop positions
Conditional Fetch: several additional conditions
Better content negotiation
More flexible authentication
João Neves 29
Static vs. Dynamic Pages
HTML pages vs. database
Personalized
Context-aware services
HTTP Proxy
An intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients;
Requests are serviced internally or by passing them on, with possible translation, to other servers;
A proxy must implement both the client and server requirements of this specification;
The client makes a request to the proxy server using the complete URL;
The proxy server connects to the remote server and requests the resource relative to that server (no protocol and hostname in the URL). João Neves 31HTTP Proxy
Server
HTTP
Proxy
Server
Client
GET http://hostname/path/to/file.html HTTP/1.0 GET /path/to/file.html HTTP/1.0 HTTP/1.0 200 Document .... HTTP/1.0 200 Document .... WebRoot dir file.html
The client makes a request to the proxy server using the complete URL;
The proxy server connects to the remote server and requests the resource relative to that server (no protocol and hostname in the URL).HTTP Proxy + Cache
Server
HTTP
Proxy
Server
Client
GET http://hostname/path/to/file.html HTTP/1.0 GET /path/to/file.html HTTP/1.0 HTTP/1.0 200 Document .... HTTP/1.0 200 Document .... WebRoot dir file.htmlCache
João Neves 33HTTP Proxy
Transparent
Configured (
http://proxy.xptoo.org:3128/
)
Automatic (Web Proxy AutoDiscovery)
Why Web Caching (Proxies)?
Assume:
cache is “close” to client
(e.g., in same network)
smaller response time: cache
“closer” to client
decrease traffic to distant
servers
• link out of institutional/local ISP network often bottleneck
origin servers Internet institutional network 10 Mb/s LAN 1,5 Mb/s access link (bottleneck…) institutional cache João Neves 35
Web Load Handling
Thousands of clients
Load sharing
DNS Round Robin
Web Switching L4 L7 – Load Balancing Devices
•
Nortel Alteon
•
A10 Networks
•
Cisco Content Switching
•
...
Akamai
Bibliography
Comer, Douglas E.
Internetworking with TCP/IP (VOL I)
Prentice Hall, 5th Ed. (2006) ISBN 0-13-187671-6
Tanenbaum, Andrew S.
Computer Networks
Prentice Hall International Editions 4th Ed. (2003)
ISBN 0-13-038488-7