Distributed Systems
Principles and Paradigms
Chapter 12
(version October 15, 2007)Maarten van Steen
Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science
Room R4.20. Tel: (020) 598 7784
E-mail:[email protected], URL: www.cs.vu.nl/∼steen/
01 Introduction 02 Architectures 03 Processes 04 Communication 05 Naming 06 Synchronization
07 Consistency and Replication 08 Fault Tolerance
09 Security
10 Distributed Object-Based Systems 11 Distributed File Systems
12 Distributed Web-Based Systems 13 Distributed Coordination-Based Systems
00 – 1 /
Distributed Web-Based Systems
Essence: The WWW is a huge client-server system with millions of servers; each server hosting thousands ofhyperlinkeddocuments: Client machine Browser OS Server machine Web server1. Get document request (HTTP) 3. Response
2. Server fetches document from local file
• Documents are generally represented in text (plain
text, HTML, XML)
• Alternative types: images, audio, video, but also
applications (PDF, PS)
• Documents may contain scripts that are executed
by the client-side software
Multi-tiered Architectures
Observation:Already very soon, Web sites were or-ganized into three tiers:Web server CGI process Database server CGI
program 1. Get request
3. Start process to fetch document
5. HTML document created HTTP request handler 6. Return result 4. Database interaction
12 – 2 Distributed Web-Based Systems/12.1 Architecture
Web Services
Observation: At a certain point, people started
rec-ognizing that it is was more than justuser ↔site
in-teraction: sites could offerservicesto other sites⇒
standardizationis then badly needed.
Service description (WSDL) Client machine Client application Stub Server application Stub Communication subsystem Communication subsystem SOAP
Service description (WSDL)Service description (WSDL)
Directory service (UDDI)
Publish service Look up a service Generate stub from WSDL description Server machine Generate stub from WSDL description
Clients: Web browsers
Observation: browsers form the Web’s most
impor-tant client-side sofware. Theyusedto be simple, but
that is long ago.
User interface Browser engine Rendering engine Network comm. HTML/XML parser Displa y bac k end Client-side script interpreter
12 – 4 Distributed Web-Based Systems/12.2 Processes
Apache Web Server
Observation: More than 70% of all Web sites are based on Apache. The server is internally organized more or less according to the steps needed to process an HTTP request:
Hook Hook Hook Hook
Function
... ... ...
Module Module Module
Apache core Functions called per hook
Link between function and hook
Request Response
Server Clusters (1/2)
Essence: To improve performance and availability, WWW servers are often clustered in a way that is transparent to clients: Front end Web server Web server Web server Web server Request Response Front end handles all incoming requests and outgoing responses
LAN
Problem: The front end may easily get overloaded, so that special measures need to be taken.
Transport-layer switching: Front end simply passes the TCP request to one of the servers, taking some performance metric into account.
Content-aware distribution: Front end reads the con-tent of the HTTP request and then selects the best server.
12 – 6 Distributed Web-Based Systems/12.2 Processes
Server Clusters (2/2)
Question: Why can content-aware distribution be so much better? Switch Client Web server Web server Distributor Distributor Dis-patcher
1. Pass setup request
to a distributor 2. Dispatcher selectsserver 3. Hand of f TCP connection 4. Inform switch Setup request Other messages 5. Forward other messages 6. Server responses
Communication (1/2)
Essence:Communication in the Web is generally based on HTTP; a relatively simple client-server transfer pro-tocol having the following request messages:
Operation
Description
Head Request to return the header of a document Get Request to return a document to the client Put Request to store a document
Post Provide data that are to be added to a docu-ment (collection)
Delete Request to delete a document
12 – 8 Distributed Web-Based Systems/12.3 Communication
Communication (2/2)
Header C/S Contents
Accept C The type of documents the client can handle Accept-Charset C The character sets are acceptable for the client
Accept-Encoding
C The document encodings the client can handle
Accept-Language
C The natural language the client can handle Authorization C A list of the client’s credentials
WWW-Authenticate
S Security challenge the client should respond to Date C+S Date and time the message was sent
ETag S The tags associated with the returned document Expires S The time for how long the response remains valid From C The client’s e-mail address
Host C The TCP address of the document’s server If-Match C The tags the document should have If-None-Match C The tags the document should not have
If-Modified-Since
C Tells the server to return a document only if it has been modified since the specified time
If-Unmodified-Since
C Tells the server to return a document only if it has not been modified since the specified time Last-Modified S The time the returned document was last modified Location S A document reference to which the client should
redirect its request
Referer C Refers to client’s most recently requested document Upgrade C+S The application protocol sender wants to switch to Warning C+S Information about status of the data in the message
SOAP
Simple Object Access Protocol: Based on XML, this is the standard protocol for communication be-tween Web services.
• SOAP isboundto an underlying protocol (i.e., it
is not independent from itscarrier)
• Conversational exchange style: Send a
docu-mentone way, get a filled-in response back.
• RPC-style exchange:Used toinvoke a Web ser-vice.
12 – 10 Distributed Web-Based Systems/12.3 Communication
A Note on XML
Observation:XML has the advantage of allowing self-describing documents. Full stop (i.e., it introduces performance problems and is not meant to be read by human beings) env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <n:alertcontrol xmlns:n="http://example.org/alertcontrol"> <n:priority>1</n:priority> <n:expires>2001-06-22T14:00:00-05:00</n:expires> </n:alertcontrol> </env:Header> <env:Body> <m:alert xmlns:m="http://example.org/alert"> <m:msg>Pick up Mary at school at 2pm</m:msg> </m:alert>
</env:Body> </env:Envelope>
Naming: URL
URL:Uniform Resource Locator tells how and where
to access a resource.
Scheme Host name Pathname
Scheme Host name Port Pathname
Scheme Host name Port Pathname http http http :// :// :// www.cs.vu.nl www.cs.vu.nl 130.37.24.11 : : 80 80 /home/steen/mbox /home/steen/mbox /home/steen/mbox (a) (b) (c) Examples: http HTTP http://www.cs.vu.nl:80/globe mailto Mail mailto:[email protected]
ftp FTP ftp://ftp.cs.vu.nl/pub/minix/README file Local file file:/edu/book/work/chp/11/11 data Inline data data:text/plain;charset=iso-8859-7,
%e1%e2%e3 telnet Remote login telnet://flits.cs.vu.nl tel Telephone tel:+31201234567
modem Modem modem:+31201234567;type=v32
12 – 12 Distributed Web-Based Systems/12.4
Synchronization: WebDAV
Problem: There is a growing need for collaborative auditing of Web documents, but bare-bones HTTP can’t help here. Solution: Web Distributed Authoring and Versioning.• Supports exclusive and shared write locks, which
operate on entire documents
• A lock is passed by means of a lock token; the
server registers the client(s) holding the lock
• Clients modify the document locally and post it
back to the server along with the lock token
Note:There is no specific support for crashed clients holding a lock.
Web Proxy Caching
Basic idea:Sites install a separateproxy serverthat handles all outgoing requests. Proxies subsequently cache incoming documents. Cache-consistency pro-tocols:
• Always verify validity by contacting server
• Age-based consistency:
Texpire=α·(Tcached−Tlast modi f ied) +Tcached
• Cooperative caching, by which you first check your
neighbors on a cache miss:
Web proxy Web server Web proxy Web proxy Cache Cache Cache Client Client Client Client Client Client Client Client Client 2. Ask neighboring proxy caches
1. Look in local cache
HTTP Get request
3. Forward request to Web server
12 – 14 Distributed Web-Based Systems/12.6 Consistency and Replication
Replication in Web Hosting
Systems
Observation:By-and-large, Web hosting systems are adopting replication to increase performance. Much research is done to improve their organization.
Fol-lows the lines ofself-managingsystems:
Web hosting system
Metric estimation Analysis +/-Reference input Initial configuration
Uncontrollable parameters (disturbance / noise)
Observed output Measured output Adjustment triggers Corrections Replica placement Consistency enforcement Request routing
Handling Flash Crowds
Observation:We needdynamic adjustmentto
bal-ance resource usage. Flash crowdsintroduce a
se-rious problem:
(a) (b)
(c) (d)
2 days 2 days
6 days 2.5 days
12 – 16 Distributed Web-Based Systems/12.6 Consistency and Replication
Server Replication
Content Delivery Network: CDNs act as Web host-ing services to replicate documents across the Inter-net providing their customers guarantees on high avail-ability and performance (example: Akamai).
Origin server Client CDN server CDN DNS server Regular DNS system Cache
1. Get base document 2. Document with refs to embedded documents
6. Get embedded documents (if not already cached)
5. Get embedded documents 7. Embedded documents Return IP address client-best server DNS lookups 3 4
Question: How would consistency be maintained in this system?
Replication of Web Apps. (1/3)
Observation:Replication becomes more difficult when dealing with databses and such. No single best solu-tion.Authoritative database
Schema Schema
Server query Server
response
full/partial data replication
full schema replication/ query templates Content-blind cache Content-aware cache Database copy Client
Edge-server side Origin-server side
Assumption:Updates are carried out atorigin server, and propagated to edge servers.
12 – 18 Distributed Web-Based Systems/12.6 Consistency and Replication
Replication of Web Apps. (2/3)
Authoritative database
Schema Schema
Server query Server
response
full/partial data replication
full schema replication/ query templates Content-blind cache Content-aware cache Database copy Client
Edge-server side Origin-server side
• Full replication: high read/write ratio, often in
combination withcomplex queries.Note:
replica-tion may possibly speed-down performance when R/W ratio goes down.
• Partial replication: high read/write ratio, but in
combination withsimple queries
Replication of Web Apps. (3/3)
Authoritative database
Schema Schema
Server query Server
response
full/partial data replication
full schema replication/ query templates Content-blind cache Content-aware cache Database copy Client
Edge-server side Origin-server side
• Content-aware caching:Check for queries at
lo-cal database, and subscribe for invalidations at
the server. Works good with range queries and
complex queries.
• Content-blind caching:Simply cache the result
of previous queries. Works great withsimple queries
that address unique results (e.g., no range queries). 12 – 20 Distributed Web-Based Systems/12.6 Consistency and Replication
Security: TLS (SSL)
Transport Layer Security: Modern version of the the Secure Socket Layer (SSL), which “sits” between transport layer and application protocols. Relatively simple protocol that can support mutual authentica-tion using certificates:
Client Server [ K [ K + + S C CA CA ] ] ([ R ]C KS+ ) Possibilities Choices 1 2 3 4 5