Questions
1. When will an IP process drop a datagram?
2. When will an IP process fragment a datagram?
3. When will a TCP process drop a segment?
Lecture 4. WWW and HTTP
• World Wide Web (WWW) is an application of the Internet computing
It is used to navigate a collection of documents of – It is used to navigate a collection of documents of
information, distributed over the internet, and connected by hyperlinks
– It uses browser to navigate and download HTML documents stored on web servers
– It grows to involve many existing applications, such as email and file transfer
• HyperText Transfer Protocol (HTTP) is the core protocol of the WWW used by brower (client) and web server
• Tim Berners-Lee at CERN invented World Wide Web application. He created HTTP, HTML, the first browser, and the first web server. He put all these as an open p p
source for free usage.
• Watch talk of TBL at TED
• HTTP is based on TCP/IP
– HTTP is implemented by two programs – HTTP is implemented by two programs
– Client program, Web browser, executing on a client machine
Sever program Web server running on a server – Sever program, Web server, running on a server
machine
– They talk each other by exchanging HTTP massages
• HTML (Hypertext Markup Language) is a coding
language create hypertext.
• A Web document (or Web page) is an object
– HTML file a JPEG image GIF image Java appletHTML file, a JPEG image, GIF image, Java applet, .. – A Web page is stored in a Web server machine, and
• Uniform Resource Locator (URL) is the global address of • Uniform Resource Locator (URL) is the global address of
documents and other resources on the World Wide Web
Example: http://bohr.wlu.ca/hfan/cp476/index.html
• URL contains three parts:
URL contains three parts:
– The first part indicates what protocol to use.
– The second part specifies the IP address or the domain name of the server host.
• HTTP functions like the FTP and SMTP
– It does file transfer from server to client. But it uses only one connection, i.e., it uses the services of TCP on port 80.
– Like SMTP, a client sends a request which looks like a mail. The server responds like send a mail to the client
client
• The server is stateless, i.e., it does not keep
records of previous requests.
– HTTP1.0 is nonpersistent, i.e., the connection is closed after a request is done
– HTTP/1.1 is persistent, i.e., the connection will keep alive for a short period time, another request coming within the time can use the same connection
Connection establishement
PC running
a browser Server runningg
Request line G l h d
A request massage
General header Request header Entity header Blank lineg
• Request line Blank line Body Request linerequest type + URL+HTTP version
• Request types • Request types
HTTP/1.0 : GET, POST, HEAD
HTTP/1.1: GET, POST, HEAD, PUT, DELETE
• URL scheme://host:port/path
• General header
• General header
cache control, connection, date, MIME-version,
upgrade
pg
• Request header: accept, charset,
accept-encoding, accept-language, authorization, from,
host, referrer, user-agent
• Status line:
HTTP version+status code+status phase HTTP version+status code+status phase
– 3 digit status code
100 = continues, 101=switching, 200=OK, 201=created, 202=accepted, 204=no content 301=multiple choices, 302=moved permanently 304=moved temporarily 400=bad request
304 moved temporarily, 400 bad request,
401=unauthorized, 403=forbidden, 404=not found
405=method not allowed, 406=not acceptable
500 i t l i 501 t i l t d
500=internal service error, 501=not implemented 503=service unavailable
General header
Example
HTTP/1.1 200 OK
Date: Mon 27 Jun 2002 17:22:47 GMT Date: Mon, 27 Jun 2002 17:22:47 GMT
Server: Apache/1.3.22 (Unix) (Red-Hat/Linux) Last-modified: Wed, 26 Jun 2002 18:12:29 GMT Last modified: Wed, 26 Jun 2002 18:12:29 GMT Etag: "841fb-4b-3d1a0179"
Accept-ranges: bytes Content-length: 75 Connection: close
Proxy server
• A proxy server is one that receives requests intended for
another server and that acts on the behalf of the client another server and that acts on the behalf of the client (as the client proxy) to obtain the requested service. HTTP client sends a request to the proxy server. The
d th b k
proxy server sends the response back
– Proxy server can keeps copies of responses to recent requests for future requests from the other clients.
– A proxy server is often used when the client and the server are incompatible for direct connection.
Cookies
• Stateless in server site. Cookies are used on
client side to store access information to a page
client side to store access information to a page
• Three components
• Three components
1. cookie header line in the HTTP response message 1. cookie header line in the HTTP response message
2. cookie header line in HTTP request message
Web Browsers
• A client software program that is used to connect web server to get web document on the internet
• Well-known browser
TBL’s text browser Mosaic Netscape – TBL’s text browser, Mosaic, Netscape – Microsoft Internet Explorer
– Mozilla FirefoxMozilla, Firefox – Apple Safari – Operap
How a browser work?
1. Given a URL, a browser intimates a request to a DNS server to resolve the IP address of the host specified in URL
URL.
2. Combining the IP address and the document path of the URL the browser initiates a HTTP request for the web URL, the browser initiates a HTTP request for the web document to the server, and then wait for the server responses to the request
3. Further process the received web document.
– If it is an HTML web document, it parse it and then render the web page in the content display window
Basic components of a browser
1. GUI
2 Domain name resolution module
2. Domain name resolution module 3. Requesting module
4. Response processing module 4. Response processing module 5. A Web document parser module
Case study: Mosaic
• Mosaic is the first graphical browser for the
WWW that supports hypermedia developed by
WWW that supports hypermedia, developed by
NCSA which quickly gained popular and
became the industry standard.
• HTTP is the primary protocol used by Mosaic to
distribute documents from the HTTPD
distribute documents from the HTTPD
information management services to Mosaic
clients.
Case study: Chrome
• Chrome is the most recent browser developed
by Google 2008
by Google, 2008.
• The most popular browser, about 20%.
• Very active development, Chromium project
• Open source, seem more
p
,
•
http://www.chromium.org/
Web Servers
• A computer that provides World Wide Web services on the Internet. It includes hardware, operating system,
W b ft TCP/IP ft d W b
Web server software, TCP/IP software, and Web pages and many programs.
W b ft i li ti th t
• Web server software is an application program that serves Web pages to Web browsers using the HTTP protocol.
• Provide responses to browser’s requests for either existing documents or dynamically built documents • Examples
• Examples
How a Web server work?
• The primary task of a Web server is to monitor a communication port on its host machine, accept HTTP commands through that port, and perform the operations specified by the commands
perform the operations specified by the commands.
• Example of HTTP commands: GET, PUT, POST, HEAD, and DELET. All commands include a URL
• When a Web server starts, it tell its OS it is ready to accept communications through a specific port, usually 80
• When the URL is received, it is translated into either a filename (in which case the file is returned to the requesting client), or a program name (which case the program is run and its output is sent to
• Contemporary web servers provide many
Contemporary web servers provide many
services
– Virtual hosts - multiple sites on the same system
– Proxy servers - to serve documents from the document roots of other sites
– Besides HTTP, support for FTP, News, email
Case study: Apache
• Apache began with the NCSA server httpd working together with Mosaic.
• Apache is now an open source Web server originally formed by
taking all the "patches" (fixes) to the NCSA Web server and making a new server out of it, as a project organized by Apache Software Foundation
• Apache has a long list of services beyond the basic process of serving documents to clients.
serving documents to clients. • Services can be configured
Three configuration files: httpd conf srm conf and access con – Three configuration files: httpd.conf, srm.conf, and access.con – More than 150 different directives can be specified
• The file structure of Apache
• Apache has two separate directories: document root, server root • The document root is the root directory of all servable documents • Example: An Apache server runs on hopper
Server Name:
Domain name:
IP address:
Document root:
Server root:
• The server root is the root directory for all of the code that implements the server
implements the server
• The server root usually has four files: One is the code for the server itself. Three others are subdirectories
• conf - for configuration information logs - to store what has happened cgi-bin - for executable scriptsg p
• The configuration file is named httpd.conf
• The directives in the configuration file control the operation of the server
Config ration file format
• Configuration file format:
– Non-blank lines that do not begin with # must begin with a directive name, which may take parameters, separated by white space