• No results found

World Wide Web. Before WWW

N/A
N/A
Protected

Academic year: 2021

Share "World Wide Web. Before WWW"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

World Wide Web

Joao.Neves@fe.up.pt

Before WWW

Major search tools: Gopher and Archie

Archie

Search FTP archives indexes

Filename based queries

Gopher

Friendly interface

(2)

Web Born

Tim Berners-Lee et al. at CERN in 1991

HyperText Transfer Protocol (HTTP)

Hypertext - embedded links in text to link to

another text document

(3)

August 1995 - March 2008

Source http://news.netcraft.com/archives/web_server_survey.html João Neves 5

Layering

HyperText Transfer Protocol Telnet Simple Network Management Dynamic Host Configuration Transmission Control Protocol (TCP)

Internet Protocol (IP)

Ethernet Wi-Fi SONET

(4)

HTTP

Standard protocol for web transfer

Request-response interaction between client and server

The server has resources as HTML files and images

Request methods: GET, HEAD, PUT, POST, DELETE, …

Response: Status line + additional info (e.g., a web page)

João Neves 7

Introduction to HTTP

It has been in use by the World-Wide Web global

information initiative since 1990

Its first version (referred to as HTTP/0.9) was a simple

protocol for raw data transfer across the Internet

HTTP/1.0 improved the protocol by allowing messages

to be in the format of MIME-like messages:

containing metainformation about the data

transferred and

modifiers on the request/response semantics

(5)

HTTP Transaction

Client

HTTP

Server

HTTP client: web browser

HTTP server: web server

Standard port: 80

Suggested alternate ports: 81, 8080, 8081

HTTP is used to transmit resources

File/documents

Image files

Query results

Outputs from CGI scripts

Anything that can be identified by a URL

WebRoot

dir

file.html

João Neves 9

Web Clients

Lynx 2.0 (1993, character based interface)

NCSA Mosaic (1993, first with graphical interface)

Marc Andreessen (author of Mosaic) moved to Netscape

Microsoft Internet Explorer (“new name for Mosaic…”)

Mozilla Firefox

Opera

Safari

(6)

The Browser

The browser

1. fetches the page requested

2. interprets the text and formatting commands that it contains

3. displays the page properly formatted on the screen

On the page strings of text that are links to other pages, called

hyperlinks

On the screen the hyperlinks are highlighted, either by underlining,

displaying them in a special color, or both

João Neves 11

Web Servers

NCSA HTTPd

non-commercial free

Apache HTTP Server

freeware

Apache Tomcat

freeware

lighttpd

freeware

Microsoft Internet Information

Services (IIS)

payware

Zeus Web Server

payware

Zope

freeware

...

(7)

Server Share

Server Share amongst the Million Busiest Sites, March 2009

(8)

Markup

“Markup” are codes inserted into texts documents

to manage formatting, printing or other process.

A description markup indicates the nature,

function, or content of the data in a file.

A procedural markup defines what processing is to

be carried out at particular points in the document.

João Neves 15

HyperText Markup Language

Language in which web pages are written

Contains formatting commands

Tells browser what to display and how to display Examples:

<TITLE> Welcome to My Great Site </TITLE>

• The title of this page is “Welcome to My Great Site”

<B>Great News!</B>

Set “Great News!” in boldface

<A HREF=”http://www.xptoo.org/”>I’m the One</A>

A link pointing to the web page http:// www.xptoo.org/index.html with the text “I’m the One” displayed

(9)

Sample HTML Tags

<A> </A> Anchor link or name

<BODY> </BODY> Document Contents

<BR> Break

<FORM> </FORM> Input form

<H1> </H1> Heading level 1

<HEAD> </HEAD> Header of a document

<HR> Horizontal Rule

<HTML> </HTML> The doc type is HTML

<LI> List Item

<OL> </OL> Ordered List

<P> </P> Paragraph break

<PRE> </PRE> Preformatted text

<TITLE> </TITLE> Document title

<UL> Unnumbered list

João Neves 17

Uniform Resource Identifiers

A URI is an identifier for some resource, and a Uniform Resource Locator (URL) gives you specific information as to obtain that resource

HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet systems,

including those supported by the next protocols:

• SMTP, NNTP, FTP

In this way, HTTP allows basic hypermedia access to resources available from diverse applications

(10)

Uniform Resource Identifiers

The following examples illustrate URL that are in common use:

Name Utility Example

ftp ftp scheme for File Transfer Protocol services ftp://ftp.is.co.za/rfc/rfc1808.txt

http http scheme for Hypertext Transfer Protocol services http://www.math.uio.no/faq/compression-faq/part1.html

file Local file file:/usr/local/etc/ntp.conf

news news scheme for USENET news groups and articles news:comp.infosystems.www.servers.unix telnet telnet scheme for interactive services via the TELNET

Protocol

telnet://melvyl.ucop.edu/

mailto mailto scheme for electronic mail addresses mailto:mduerst@ifi.unizh.ch gopher gopher scheme for Gopher and Gopher+ Protocol

services

gopher://stap.umn.edu/00/Weather/Ca/Los%20Angeles

João Neves 19

Uniform Resource Locator

Some URL schemes use the format "user:password" in the userinfo field.

This practice is NOT RECOMMENDED, because the passing of authentication information in clear text (such as URI) has proven to be a security risk in almost every case where it has been used. [RFC2396]

<scheme>: // [userinfo @] hostname [: port] / path [; parameters] [?query]

(11)

HyperText Transfer Protocol

A very simple, stateless protocol for sessionless

exchanges

• Browser creates a new connection each time it wants to make a new request (for a page, image, etc.)

Exceptions:

• HTTP 1.1 added support for persistent connections and

pipelining

• Clients + servers might keep state information

• Cookies provide a way of recording state

João Neves 21

The http protocol: more

http: TCP transport service

client initiates TCP connection

(creates socket) to server, port 80

server accepts TCP connection from client

http messages (application-layer protocol messages) exchanged between browser (http client) and Web server (http server)

TCP connection closed

http is “stateless”

server maintains no

(12)

HTTP

GET /path/to/file/index.html HTTP/1.0

HTTP method

Path: the part of the URL after the hostname, i.e.

request URI

The HTTP version

João Neves 23

HTTP

Session

jneves@bart(1)$ telnet www.inescporto.pt 80 [...]

GET /~jneves/index.html HTTP/1.0 From: Joao.Neves@xptoo.org User-Agent: Camachina/5.0

HTTP/1.1 200 OK

Date: Tue, 26 May 2009 18:06:13 GMT

Server: Apache/2.30 (Unix) PHP/5.5 DAV/2 mod_perl/2.9 Perl/v5.20 Last-Modified: Fri, 04 May 2007 18:41:20 GMT

Accept-Ranges: bytes Content-Length: 91 Connection: close Content-Type: text/html <html> <head>

<meta HTTP-EQUIV="REFRESH" content="0; url=./index.shtml"> </head>

</html>

Connection closed by foreign host.

(13)

HTTP Request Headers

Header Description

From RFC822 E-mail address of the user

User-Agent Client Software

Accept File types that client will accept, e.g., text/plain, text/html

Accept-encoding Compression methods, e.g., x-compress; x-zip

Accept-Language Language(s) used

Referrer (optional) URL of the document (or element within the document) from which the URL in the request was obtained

If-Modified-Since Return document if modified since specified date

Content-length Length in octets of data to follow

Content-Type Type of the item

Pragma: no-cache Directive understood by a proxy server; When present the proxy should not return a document from the cache

João Neves 25

HTTP Response Headers

Header Description

Server Server Software

Date Current Date

Last-Modified Modification date of the document

Expires Document expiration date

Location The location of the document in

redirection responses

Pragma A hint, e.g. no cache

MIME-version

Link URL of document’s parent

(14)

HTTP Status Codes

Code Text 2xx Success 3xx Redirection 301 Moved 302 Found 4xx Client Errors 400 Bad Request 401 Unauthorized 404 Not found 5xx Server Errors 500 Internal Error 502 Service Overload João Neves 27

HTTP over TLS

(15)

HTTP 1.1 Features

Persistent TCP Connections: remain open for

multiple requests

Partial Document Transfers: clients can specify start

and stop positions

Conditional Fetch: several additional conditions

Better content negotiation

More flexible authentication

João Neves 29

Static vs. Dynamic Pages

HTML pages vs. database

Personalized

Context-aware services

(16)

HTTP Proxy

An intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients;

Requests are serviced internally or by passing them on, with possible translation, to other servers;

A proxy must implement both the client and server requirements of this specification;

The client makes a request to the proxy server using the complete URL;

The proxy server connects to the remote server and requests the resource relative to that server (no protocol and hostname in the URL). João Neves 31

HTTP Proxy

Server

HTTP

Proxy

Server

Client

GET http://hostname/path/to/file.html HTTP/1.0 GET /path/to/file.html HTTP/1.0 HTTP/1.0 200 Document .... HTTP/1.0 200 Document .... WebRoot dir file.html

The client makes a request to the proxy server using the complete URL;

The proxy server connects to the remote server and requests the resource relative to that server (no protocol and hostname in the URL).

(17)

HTTP Proxy + Cache

Server

HTTP

Proxy

Server

Client

GET http://hostname/path/to/file.html HTTP/1.0 GET /path/to/file.html HTTP/1.0 HTTP/1.0 200 Document .... HTTP/1.0 200 Document .... WebRoot dir file.html

Cache

João Neves 33

HTTP Proxy

Transparent

Configured (

http://proxy.xptoo.org:3128/

)

Automatic (Web Proxy AutoDiscovery)

(18)

Why Web Caching (Proxies)?

Assume:

cache is “close” to client

(e.g., in same network)

smaller response time: cache

“closer” to client

decrease traffic to distant

servers

• link out of institutional/local ISP network often bottleneck

origin servers Internet institutional network 10 Mb/s LAN 1,5 Mb/s access link (bottleneck…) institutional cache João Neves 35

Web Load Handling

Thousands of clients

Load sharing

DNS Round Robin

Web Switching L4 L7 – Load Balancing Devices

Nortel Alteon

A10 Networks

Cisco Content Switching

...

Akamai

(19)

Bibliography

Comer, Douglas E.

Internetworking with TCP/IP (VOL I)

Prentice Hall, 5th Ed. (2006) ISBN 0-13-187671-6

Tanenbaum, Andrew S.

Computer Networks

Prentice Hall International Editions 4th Ed. (2003)

ISBN 0-13-038488-7

References

Related documents

In short, this approach replaces a company’s inefficient and ineffective incentive compensation management legacy application and processes with an optimized incentive

Before discussing the mean-variance criterion, CAPM statistics, and stochastic dominance theory and associated tests, we first discuss the utility functions of risk averters and

Write a member of santa fe store hours are updated regularly, and to ssl path unless it came to buy homes in business: desert academy of santa fe.. Understands that you qualify

This clearly reveals that the growth in the dematerialization process was not keeping pace with the growth in the total turn over of shares in the Indian capital

Thus for these activities, constraint set 2.6 implies that a train can only depart on a track of an open track section if a train has departed on the same track in the same

WH/2 BK/4 BU/3 BN/1 PNP - NC PNP out WH/2 BK/4 BU/3 BN/1 NPN - NC NPN out Schema di collegamento PNP Wiring diagram PNP C POS NEG Schema di collegamento NPN Wiring diagram NPN POS

HPV16 E6 increased the internalization of activated receptor species, and the signaling adaptor protein GRB2 was shown to be critical for HPV16 E6 mediated enhanced EGFR

At a given temperature, an ionic fluid in a slit pore may exist either as a liquid-like or as a vapor-like phase depending on the pore size, surface electrical potential, as well