• No results found

Introduction to HTTP

N/A
N/A
Protected

Academic year: 2022

Share "Introduction to HTTP"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to HTTP

r

HTTP: HyperText Transfer Protocol

m Communication protocol between clients and servers

m Application layer protocol for WWW r

Client/Server model:

m Client: browser that requests, receives, displays object

m Server: receives requests and responds to them r

Protocol consists of various operations

m Few for HTTP 1.0 (RFC 1945, 1996)

m Many more in HTTP 1.1 (RFC 2616, 1999)

Laptop w/

Netscape

Server w/ Apache Desktop w/

Explorer

http request http request

http response http response

(2)

App-layer protocol defines:

r Types of messages

exchanged (e.g., reqs

& response messages

r Syntax of message types: what fields in messages & how fields are delineated

r Semantics of the fields (i.e., the meaning of

information in fields)

r Rules for when and how processes send &

Public-domain protocols:

r defined in RFCs (Requests for Comments)

r allows for

interoperability

r eg, HTTP, SMTP

Proprietary protocols:

r eg, KaZaA

(3)

Client-server paradigm

Typical network app has two pieces: client and server

application transport

network data link physical

application transport

network data link physical

Client:

r initiates contact with server (“speaks first”)

r typically requests service from server

r Web: client implemented in browser;

e-mail: in mail reader

request

reply

Server:

r provides requested service to client

r e.g., Web server sends requested Web page;

mail server delivers e-mail

(4)

Processes communicating across network

r process sends/receives messages to/from its socket

r socket analogous to door

m

sending process shoves message out door

m

sending process assumes transport infrastructure on other side of door which brings message to socket at receiving process

process

TCP with buffers, variables

socke t host or server

process

TCP with buffers, variables

socke t host or server

Internet

controlled by OS

controlled by app developer

r API allows: (1) choice of transport protocol (TCP/UDP);

(2) ability to set several parameters

(5)

HTTP overview

HTTP: hypertext transfer protocol

r

Web’s application layer protocol

r

client/server model

m

client: browser that requests, receives,

“displays” Web objects

m

server: Web server sends objects in

response to requests

r

HTTP 1.0: RFC 1945

r

HTTP 1.1: RFC 2616

PC running Internet Explorer or Firefox

Server running Apache Web

server Mac running

Safari

HTTP request

HTTP request HTTP response

HTTP response

(6)

HTTP overview (continued)

Uses TCP:

r

client initiates TCP

connection (creates socket) to server, port 80

r

server accepts TCP connection from client

r

HTTP messages

(application-layer protocol messages) exchanged

between browser (HTTP client) and Web server (HTTP server)

r

TCP connection closed

HTTP is “stateless”

r

server maintains no information about past client requests

Protocols that maintain “state” are complex!

r past history (state) must be maintained

r if server/client crashes, their views of “state” may be

inconsistent, must be reconciled

aside

(7)

HTTP connections

Non-persistent HTTP

r At most one object is sent over a TCP

connection.

r HTTP/1.0 uses

non-persistent HTTP

Persistent HTTP

r Multiple objects can be sent (one at a time) over a single TCP

connection between client and server.

r HTTP/1.1 uses

persistent connections in default mode

m

Pipelined

m

Non-pipelined

(8)

Response time modeling

Definition of RTT: time to send a small packet to travel from client to server and back.

Response time:

r one RTT to initiate TCP connection

r one RTT for HTTP

request and first few bytes of HTTP response to return

r file transmission time

total = 2*RTT+transmit time

time to transmit file

initiate TCP connection

RTT request file

RTT file received

time time

(9)

Classical HTTP/1.0

http://www.somewhere.com/index.html index.html references: page1.jpg, page2.jpg, page3.jpg.

time to

transmit index.hml initiate TCP

connection RTT

GET index.html RTT

file received

GET page1.jpg

time to

transmit page1.jpg

(10)

Persistent HTTP

Nonpersistent HTTP issues:

r

requires 2 RTTs per object

r

OS must work and allocate host resources for each TCP connection

r

but browsers often open parallel TCP connections to fetch referenced objects Persistent HTTP

r

server leaves connection open after sending response

r

subsequent HTTP messages between same client/server are sent over connection

Persistent without pipelining:

r

client issues new request only when previous

response has been received

r

one RTT for each referenced object

Persistent with pipelining:

r

default in HTTP/1.1

r

client sends requests as soon as it encounters a referenced object

r

as little as one RTT for all

the referenced objects

(11)

HTTP request message

r HTTP request message:

m

ASCII (human-readable format)

GET /somedir/page.html HTTP/1.1 Host: www.someschool.edu

User-agent: Mozilla/4.0 Connection: close

Accept-language:fr

(extra carriage return, line feed) request line

(GET, POST, HEAD commands)

header lines Carriage return,

line feed indicates end

of message

(12)

HTTP request message: general format

(13)

HTTP Methods

r GET: retrieve a file (95% of requests)

r HEAD: just get meta-data (e.g., mod time)

r POST: submitting a form to a server

r PUT: store enclosed document as URI

r DELETE: removed named resource

r LINK/UNLINK: in 1.0, gone in 1.1

r TRACE: http “echo” for debugging (added in 1.1)

r CONNECT: used by proxies for tunneling (1.1)

r OPTIONS: request for server/proxy options (1.1)

(14)

HTTP response message

HTTP/1.1 200 OK Connection: close

Date: Thu, 06 Aug 1998 12:00:15 GMT Server: Apache/1.3.0 (Unix)

Last-Modified: Mon, 22 Jun 1998 …...

Content-Length: 6821 Content-Type: text/html

data data data data data ...

status line (protocol status code status phrase)

header lines

data, e.g.,

requested

HTML file

(15)

HTTP Response Status Codes

r 1XX: Informational (def’d in 1.0, used in 1.1)

100 Continue , 101 Switching Protocols

r 2XX: Success

200 OK, 206 Partial Content

r 3XX: Redirection

301 Moved Permanently, 304 Not Modified

r 4XX: Client error

400 Bad Request, 403 Forbidden, 404 Not Found

r 5XX: Server error

500 Internal Server Error, 503 Service

Unavailable, 505 HTTP Version Not Supported

(16)

Trying out HTTP (client side) for yourself

1. Telnet to your favorite Web server:

Opens TCP connection to port 80

(default HTTP server port) at www.eurecom.fr.

Anything typed in sent

to port 80 at www.eurecom.fr telnet www.eurecom.fr

80

2. Type in a GET HTTP request:

GET /~ross/index.html HTTP/1.0 By typing this in (hit carriage return twice), you send

this minimal (but complete) GET request to HTTP server

3. Look at response message sent by HTTP server!

(17)

Request Generation

r User clicks on something

r Uniform Resource Locator (URL):

m http://www.cnn.com

m http://www.cpsc.ucalgary.ca

m https://www.paymybills.com

m ftp://ftp.kernel.org

r Different URL schemes map to different services

r Hostname is converted from a name to a 32-bit IP address (DNS lookup, if needed)

r Connection is established to server (TCP)

(18)

What Happens Next?

r Client downloads HTML document

m

Sometimes called “container page”

m

Typically in text format (ASCII)

m

Contains instructions for rendering

(e.g., background color, frames)

m

Links to other pages

r Many have embedded objects:

m

Images: GIF, JPG (logos, banner ads)

m

Usually automatically retrieved

• I.e., without user involvement

• can control sometimes

(e.g. browser options, junkbusters)

<html>

<head>

<meta

name=“Author”

content=“Erich Nahum”>

<title> Linux Web Server Performance

</title>

</head>

<body text=“#00000”>

<img width=31 height=11

src=“ibmlogo.gif”>

<img

src=“images/new.gif>

<h1>Hi There!</h1>

Here’s lots of cool linux stuff!

<a href=“more.html”>

Click here</a>

for more!

</body>

</html>

sample html file

(19)

Web Server Role

r Respond to client requests, typically a browser

m

Can be a proxy, which aggregates client requests (e.g., AOL)

m

Could be search engine spider or robot (e.g., Keynote) r May have work to do on client’s behalf:

m

Is the client’s cached copy still good?

m

Is client authorized to get this document?

r Hundreds or thousands of simultaneous clients

r Hard to predict how many will show up on some day (e.g., “flash crowds”, diurnal cycle, global presence)

r Many requests are in progress concurrently

(20)

HTTP Request Format

GET /images/penguin.gif HTTP/1.0

User-Agent: Mozilla/0.9.4 (Linux 2.2.19) Host: www.kernel.org

Accept: text/html, image/gif, image/jpeg Accept-Encoding: gzip

Accept-Language: en

Accept-Charset: iso-8859-1,*,utf-8 Cookie: B=xh203jfsf; Y=3sdkfjej

<cr><lf>

• Messages are in ASCII (human-readable)

• Carriage-return and line-feed indicate end of headers

• Headers may communicate private information

(browser, OS, cookie information, etc.)

(21)

Response Format

HTTP/1.0 200 OK Server: Tux 2.0

Content-Type: image/gif Content-Length: 43

Last-Modified: Fri, 15 Apr 1994 02:36:21 GMT Expires: Wed, 20 Feb 2002 18:54:46 GMT

Date: Mon, 12 Nov 2001 14:29:48 GMT Cache-Control: no-cache

Pragma: no-cache Connection: close

Set-Cookie: PA=wefj2we0-jfjf

<cr><lf>

<data follows…>

• Similar format to requests (i.e., ASCII)

(22)

Outline of an HTTP Transaction

r This section describes the basics of servicing an HTTP GET request from user space

r Assume a single process

running in user space, similar to Apache 1.3

r We’ll mention relevant socket operations along the way

initialize;

forever do { get request;

process;

send response;

log request;

}

server in a nutshell

(23)

Readying a Server

r

First thing a server does is notify the OS it is interested in WWW server requests; these are typically on TCP port 80.

Other services use different ports (e.g., SSL is on 443)

r

Allocate a socket and bind()'s it to the address (port 80)

r

Server calls listen() on the socket to indicate willingness to receive requests

r

Calls accept() to wait for a request to come in (and blocks)

r

When the accept() returns, we have a new socket which represents a new connection to a client

s = socket(); /* allocate listen socket */

bind(s, 80); /* bind to TCP port 80 */

listen(s); /* indicate willingness to accept */

while (1) {

newconn = accept(s); /* accept new connection

*/b

(24)

Processing a Request

r getsockname() called to get the remote host name

m

for logging purposes (optional, but done by most)

r gethostbyname() called to get name of other end

m

again for logging purposes

r gettimeofday() is called to get time of request

m

both for Date header and for logging

r read() is called on new socket to retrieve request

r request is determined by parsing the data

m

“GET /images/jul4/flag.gif”

remoteIP = getsockname(newconn);

remoteHost = gethostbyname(remoteIP);

gettimeofday(currentTime);

read(newconn, reqBuffer, sizeof(reqBuffer));

reqInfo = serverParse(reqBuffer);

(25)

Processing a Request (cont)

r stat() called to test file path

m

to see if file exists/is accessible

m

may not be there, may only be available to certain people

m

"/microsoft/top-secret/plans-for-world-domination.html"

r stat() also used for file meta-data

m

e.g., size of file, last modified time

m

"Has file changed since last time I checked?“

r might have to stat() multiple files and directories

r assuming all is OK, open() called to open the file

fileName = parseOutFileName(requestBuffer);

fileAttr = stat(fileName);

serverCheckFileStuff(fileName, fileAttr);

open(fileName);

(26)

Responding to a Request

r

read() called to read the file into user space

r

write() is called to send HTTP headers on socket

(early servers called write() for each header!)

r

write() is called to write the file on the socket

r

close() is called to close the socket

r

close() is called to close the open file descriptor

r

write() is called on the log file

read(fileName, fileBuffer);

headerBuffer = serverFigureHeaders(fileName, reqInfo);

write(newSock, headerBuffer);

write(newSock, fileBuffer);

close(newSock);

close(fileName);

write(logFile, requestInfo);

(27)

Network View: HTTP and TCP

r TCP is a connection-oriented protocol

SYN SYN/ACK GET URL ACK

YOUR DATA HERE

FIN FIN/ACK

ACK

Web Client Web Server

(28)

Example Web Page

My simple page.

Hello World!

page.html

face.png

castle.gif

(29)

Client Server

The “classic” approach in HTTP/1.0 is to use one HTTP request per TCP connection, serially.

TCP SYN

TCP FIN

page.html G

TCP SYN

TCP FIN

face.png G

TCP SYN

TCP FIN

castle.gif

G

(30)

Client Server Concurrent (parallel) TCP connections can be used to make things faster.

TCP SYN

TCP FIN

page.html G

castle.gif G

F S G

face.png S

F

C S C S

(31)

Client Server

The “persistent HTTP”

approach can re-use the same TCP connection for Multiple HTTP transfers, one after another, serially.

Amortizes TCP overhead, but maintains TCP state longer at server.

TCP FIN

Timeout TCP SYN

page.html G

face.png G

castle.gif

G

(32)

Client Server

The “pipelining” feature in HTTP/1.1 allows

requests to be issued asynchronously on a persistent connection.

Requests must be

processed in proper order.

Can do clever packaging.

Timeout TCP SYN

page.html G

castle.gif

face.png

GG

(33)

Summary of Web and HTTP

r The major application on the Internet

m

Majority of traffic is HTTP (or HTTP-related) r Client/server model:

m

Clients make requests, servers respond to them

m

Done mostly in ASCII text (helps debugging!) r Various headers and commands

m

Too many to go into detail here

m

Many web books/tutorials exist

(e.g., Krishnamurthy & Rexford 2001)

References

Related documents

write Can be used to write data from stream sockets (and datagram sockets if you declare a default remote socket address) just like send.

In the introduction to HTTP that appeared earlier in this chapter, a few HTTP response headers were seen, and in the HelloWorld Servlet the Content- Type response header was used..

For the browser details, you should display the client IP address, client hostname, and client user agent.. Client user agent note should only list the user agent – no cookies

After you have configured the server, you can configure options (path, access list to apply, maximum number of connections, or timeout policy) that apply to both standard and

Our last sentence was the thesis statement: this is where we answered the essay question and gave our opinion.... Writing the

The contract authorized an assignment, and because of the incompatibility of the assignor’s (U.S. Cellu- lar’s) cellphones and the assignee’s (Sprint’s) mobile phone network,

The HTTP client initiates a socket connection to the HTTP server which responds by creating a socket for the two sides - client and server - to talk to each other.. The client

The variation of skin friction coefficient with respect to volume fraction is shown in Fig. The skin friction coefficient is observed to decrease with increase in volume fraction