Distributed Systems
Principles and Paradigms
Maarten van Steen
VU Amsterdam, Dept. Computer ScienceChapter 12: Distributed Web-Based Systems
Version: December 10, 2012
1 / 19
Distributed Web-Based Systems 12.1 Architecture
Distributed Web-based systems
Essence
The WWW is a huge client-server system with millions of servers; each
server hosting thousands of
hyperlinked
documents.
Documents are often represented in text (plain text, HTML, XML)
Alternative types: images, audio, video, applications (PDF, PS)
Documents may contain scripts, executed by client-side software
Client machine
Browser
OS
Server machine
Web server
1. Get document request (HTTP) 3. Response
2. Server fetches document from local file
2 / 19
Distributed Web-Based Systems 12.1 Architecture
2 / 19
Distributed Web-Based Systems 12.1 Architecture
Multi-tiered architectures
Observation
Already very soon, Web sites were organized into three tiers.
Web server CGI process Database server CGI
program 1. Get request
3. Start process to fetch document
5. HTML document created HTTP request handler 6. Return result 4. Database interaction 3 / 19
Distributed Web-Based Systems 12.1 Architecture
Distributed Web-Based Systems 12.1 Architecture
Web services
Observation
At a certain point, people started recognizing that it is was more than just
user ↔ site
interaction: sites could offer
services
to other sites ⇒
standardization
is then badly needed.
Service description (WSDL) Client machine Client application Stub Server application Stub Communication subsystem Communication subsystem SOAP
Service description (WSDL)Service description (WSDL)
Directory service (UDDI)
Publish service Look up a service Generate stub from WSDL description Server machine Generate stub from WSDL description 4 / 19
Distributed Web-Based Systems 12.1 Architecture
4 / 19
Distributed Web-Based Systems 12.2 Processes
Apache Web server
Observation: More than 52% of all 185 million Web sites are Apache.
The server is internally organized more or less according to the steps needed
to process an HTTP request.
Hook Hook Hook Hook
Function
...
...
...
Module Module Module
Apache core Functions called per hook
Link between function and hook
Request Response
5 / 19
Distributed Web-Based Systems 12.2 Processes
5 / 19
Distributed Web-Based Systems 12.2 Processes
Server clusters
Essence
To improve performance and availability, WWW servers are often clustered in
a way that is transparent to clients.
Front end Web server Web server Web server Web server Request Response Front end handles all incoming requests and outgoing responses
LAN
Distributed Web-Based Systems 12.2 Processes
Server clusters
Problem
The front end may easily get overloaded, so that special measures
need to be taken.
Transport-layer switching:
Front end simply passes the TCP
request to one of the servers, taking some performance metric
into account.
Content-aware distribution:
Front end reads the content of the
HTTP request and then selects the best server.
7 / 19
Distributed Web-Based Systems 12.2 Processes
7 / 19
Distributed Web-Based Systems 12.2 Processes
Server Clusters
Question
Why can content-aware distribution be so much better?
Switch Client Web server Web server Distributor Distributor Dis-patcher
1. Pass setup request
to a distributor 2. Dispatcher selectsserver 3. Hand of f TCP connection 4. Inform switch Setup request Other messages 5. Forward other messages 6. Server responses 8 / 19
Distributed Web-Based Systems 12.2 Processes
8 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Web proxy caching
Basic idea
Sites install a separate
proxy server
that handles all outgoing requests.
Proxies subsequently cache incoming documents. Cache-consistency
protocols:
Always verify validity by contacting server
Age-based consistency:
T
expire=
α ·(T
cached− T
last modified) +
T
cached9 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Distributed Web-Based Systems 12.6 Consistency and Replication
Web proxy caching
Basic idea (cnt’d)
Cooperative caching, by which you first check your neighbors on a
cache miss
Web proxy Web server Web proxy Web proxy Cache Cache Cache Client Client Client Client Client Client Client Client Client 2. Ask neighboring proxy caches1. Look in local cache HTTP Get request 3. Forward request to Web server 10 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
10 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication in Web hosting systems
Observation
By-and-large, Web hosting systems are adopting replication to increase
performance. Much research is done to improve their organization. Follows
the lines of
self-managing
systems.
Web hosting system
Metric estimation Analysis +/-Reference input Initial configuration
Uncontrollable parameters (disturbance / noise)
Observed output Measured output Adjustment triggers Corrections Replica placement Consistency enforcement Request routing 11 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
11 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Handling flash crowds
Observation
We need
dynamic adjustment
to balance resource usage.
Flash
crowds
introduce a serious problem.
(a) (b)
(c) (d)
2 days 2 days
6 days 2.5 days
Distributed Web-Based Systems 12.6 Consistency and Replication
Server replication
Content Delivery Network
CDNs act as Web hosting services to replicate documents across the
Internet providing their customers guarantees on high availability and
performance (example: Akamai).
Origin server Client CDN server CDN DNS server Regular DNS system Cache
1. Get base document
2. Document with refs to embedded documents
6. Get embedded documents (if not already cached)
5. Get embedded documents 7. Embedded documents Return IP address client-best server DNS lookups 3 4 13 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
13 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication of Web applications
Observation
Replication becomes more difficult when dealing with databses and
such. No single best solution.
Assumption
Updates are carried out at
origin server
, and propagated to edge
servers.
14 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
14 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication of Web applications: normal
Appl logic Appl logic Authoritative database Schema Schema Web server Web server query response
full/partial data replication
full schema replication/ query templates Content-aware
cache
Database copy
Edge-server side Origin-server side
Content-blind cache Client
15 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication of Web applications
Alternative solutions
Full replication:
high read/write ratio, often in combination with
complex
queries
.
Partial replication:
high read/write ratio, but in combination with
simple
queries
Content-aware caching:
Check for queries at local database, and
subscribe for invalidations at the server. Works good with
range queries
and
complex queries
.
Content-blind caching:
Simply cache the result of previous queries.
Works great with
simple queries
that address unique results (e.g., no
range queries).
Question
What can be said about replication vs. performance?
16 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
16 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication Web apps.: full/partial replication
Appl logic Schema Web server response
full/partial data replication
full schema replication/ query templates Content-blind cache Content-aware cache Database copy Client Edge-server side Authoritative database Schema Web server query Origin-server side Appl logic 17 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
17 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication Web apps.: content-aware caching
Appl logic Schema Web server response
full/partial data replication
full schema replication/ query templates Content-blind cache Content-aware cache Database copy Client Edge-server side Authoritative database Schema Web server query Origin-server side Appl logic
Distributed Web-Based Systems 12.6 Consistency and Replication
Replication Web apps.: content-blind caching
Appl logic Schema Web server response
full/partial data replication
full schema replication/ query templates Content-blind cache Content-aware cache Database copy Client Edge-server side Authoritative database Schema Web server query Origin-server side Appl logic 19 / 19
Distributed Web-Based Systems 12.6 Consistency and Replication