• No results found

Distributed Systems [Fall 2012]

N/A
N/A
Protected

Academic year: 2021

Share "Distributed Systems [Fall 2012]"

Copied!
49
0
0

Loading.... (view fulltext now)

Full text

(1)

Distributed Systems

[Fall 2012]

[W4995-2]

Lec 2: Example use cases:

The Web and cloud computing

(2)

Lab 0: C/C++ Warm-Up

(Due Sept 14)

• There will be an initial C/C++ warm-up lab (Lab 0)

– Use lab to figure out whether to take this class

• You will build a simple

client-server application

– Client connects to server using TCP and sends it the name of a file – The server looks the file name up in a designated directory and, if it

exists, it streams its contents to the client

– The client saves the file into his own designated directory

– You will also add in-memory caching on the server side, which should be implemented using C++ and STL

– There will be no code skeletons – you’ll need to provide all code, including Makefile!

(3)

Last Time (Reminder/Quiz)

• Define distributed systems

• Distributed systems goals

(4)

Last Time (Reminder/Quiz)

• Define distributed systems

• Distributed systems goals

– Raise the level of abstraction, provide location transparency, scalable capacity, availability, modularity

• Distributed systems challenges

– Interfaces, scalability, consistency, fault-tolerance, security,

implementation 4

Local OS Local OS Local OS Middleware services

Distributed applications

Network

Linux, OSX, Windows

Distributed FSes, distributed computation systems, locking services, CDNs, …

Gmail, search, Facebook,

(5)

Today

• Web architectures

– Simple architectures and real-world architectures

– Many slides were developed by Aaron Bannert:

http://www.codemass.com/presentations/apachecon200

5/ac2005scalablewebarch.ppt

• Cloud computing

(6)

What Are Some Simple Architectures?

• Tanenbaum reading for today

(7)

1. The Client-Server Model

• Popular protocols between clients/servers:

– HTTP, HTTPS

– AJAX: asynchronous requests

Client

Server

requests

responses

time

request

response

wait for result

(8)

Server-Side Processing

• Initially, Web servers returned static HTML pages

– No processing on server, no state, no user-provided data

• 1994:

CGI (Common Gateway Interface)

– Server invokes a program upon each request

– Program gets client data from stdin, outputs HTML to stdout

– Example: Listing 1

• Then came a lot of

server-side frameworks

:

– Django, ASP, JSP, Ruby-on-Rails, …

– Much more flexible and extensible than CGI

– Separate presentation, logic, and DB

– Example: Listing 2

(9)

• What are the benefits/problems with this architecture?

+

Modularity, better reliability/scalability opportunities

2. The Three-Tiered Architecture

application database

time request

operation return result wait for result

wait for data

request data return data

UI

app

server

DB

Client (browser) S e rvi c e WAN LAN LAN user interface

(10)

• In reality, the line is

much fuzzier

and the architecture is

not as clean on service-side…

Client-side Computation

10 user interface application database user interface Client (browser) S e rvi c e database user interface application application user interface database database application

(11)

3. Real Architectures

Clients (browsers)

• Discuss each layer:

– What constitutes it? – What does it do?

– Hardware requirements – Deployment choices

(12)

External Caching Tier

12

http://www.codemass.com/presentations/apachecon2005/ac2005scalablewebarch.ppt

Clients (browsers)

• What is this?

– Squid, Apache mod_proxy – Content-delivery networks

(13)

External Caching Tier

Clients (browsers)

• What does it do?

– Caches outbound data • Images, CSS, XML,

HTML, pictures, videos, …

– Denial of Service defense – Cache may be close to user

(14)

External Caching Tier

14

http://www.codemass.com/presentations/apachecon2005/ac2005scalablewebarch.ppt

Clients (browsers)

• Hardware requirements

– Lots of memory

– Moderate to little CPU – Fast network

– Potentially distributed across the world

(15)

External Caching Tier

Clients (browsers)

• Choices:

– What to cache?

– How much to cache?

– Where to cache (internal vs. external)?

(16)

Front-end Tier

16

http://www.codemass.com/presentations/apachecon2005/ac2005scalablewebarch.ppt

Clients (browsers)

• What is this?

– Apache – thttpd

– Tux Web Server – IIS

(17)

Front-end Tier

Clients (browsers)

• What does it do?

– HTTP, HTTPS

– Serves static content from disk

– Generates dynamic content • CGI/PHP/python/Django/.. – Dispatches requests to the

App Server Tier

• Tomcat, Weblogic, Websphere, JRun, …

(18)

Front-end Tier

18

http://www.codemass.com/presentations/apachecon2005/ac2005scalablewebarch.ppt

Clients (browsers)

• Hardware requirements

– Lots and lots of memory

• Memory is main bottleneck I in web serving

– CPU depends on usage

• Dynamic pages need CPU • Static pages need little CPU – Cheap slow disk is enough

(19)

Front-end Tier

Clients (browsers)

• Choices:

– How much dynamic content? – When to offload dynamic

processing to app servers? – When to offload database

operations?

(20)

20

http://www.codemass.com/presentations/apachecon2005/ac2005scalablewebarch.ppt

Clients (browsers)

• What does it do?

– Dynamic page processing

• ASP, JSP • Servlets

– Internal services

• Eg.: search, shopping cart, credit card

processing

• There can be a tens of these services!

(21)

Clients (browsers)

• How does it work?

1. Web Tier generates the request using

Home-brewed RPC

REST

Corba

• Java RMI

SOAP

XMLRPC

2. App Server processes request and responds

(22)

22

http://www.codemass.com/presentations/apachecon2005/ac2005scalablewebarch.ppt

Clients (browsers)

Application Server Tier

Decoupling of services is

GOOD

– Manage Complexity using well-defined APIs

BUT:

remote calling overhead can be expensive!

– Marshaling of data, sockets, net latency, … – SOAP, XMLRPC … don’t scale that well…

– We’ll talk about some

efficient RPC systems next week

(23)

Clients (browsers)

• Hardware requirements

– Lots and lots and lots of memory

• App Servers are very memory hungry – Fast CPU required, and lots of them

– Disk typically isn’t needed

Application Server Tier

– (This will be an expensive machine.)

(24)

24

http://www.codemass.com/presentations/apachecon2005/ac2005scalablewebarch.ppt

Clients (browsers)

• What is this?

– Relational databases (distributed or not)

• PostgreSQL, SQLite, Oracle, MySQL, Berkeley DB

– Non-relational databases or distributed file systems

• Bigtable, Megastore, MongoDB, Hadoop Hbase, HDFS, …

• Tradeoffs:

– Relational databases don’t scale that well, but provide

convenient interface, sound properties (e.g., strong consistency)

– Non-relational DBs scale better

(25)

Clients (browsers)

• Hardware Requirements

– Entirely dependent upon application

– Likely to be your most expensive machine(s) – Tons of memory

– Large disks

– Spindles galore

– RAID is useful for redundancy

(26)

26

http://www.codemass.com/presentations/apachecon2005/ac2005scalablewebarch.ppt

Clients (browsers)

• Choices:

– How much logic to place inside the DB? – How much consistency do I need?

– How to partition data across machines?

• Spreading a dataset across multiple logical database “slices” in order to achieve better performance

(27)

Clients (browsers)

• What is this?

– Object cache (e.g., intermediary app-level results)

• What applications?

– Memcached

– Application-level caching inside the application servers

(28)

28

http://www.codemass.com/presentations/apachecon2005/ac2005scalablewebarch.ppt

Clients (browsers)

• What does it do?

– Caches objects closer to the Application or Web Tiers

– Tuned for the application

• The external cache is generic

– Very fast access (<1ms)

(29)

Clients (browsers)

• Hardware requirements

– Lots of Memory – Little or no disk

– Moderate to low CPU – Fast Network

(30)

30

http://www.codemass.com/presentations/apachecon2005/ac2005scalablewebarch.ppt

Clients (browsers)

• Lots of extra services commonly used in Web services

– DNS

– Time synchronization (we’ll see why this is very important)

– System health monitoring

– Intrusion detection systems

– …

(31)

Clients (browsers)

The Glue

• Load balancers

• Routers

• Switches

• Firewalls

(32)

Whew! What Did We Learn?

• Web architectures are complex

• But there are well-known solutions

• There are lots of tradeoffs and understanding the

workload is key in choosing the right product to use at

each layer

• Each layer has

distinct hardware requirements

and

likely distinct bottlenecks

– Except for RAM, which is very popular

• What does the last observation tell us?

(33)

Today

• Web architectures

– Simple architectures and real-world architectures

• Cloud computing

– What it means and how it began

– Slides inspired by Armando Fox’ presentation on cloud

computing history for the UC Berkeley’s CS-10 course

(34)

What is cloud computing?

• Computing technology in which data and/or

computation are outsourced to a massive-scale,

multi-user infrastructure that is managed by a third-party.

• Appeared gradually due to two important challenges

facing the Web:

– Scaling

– Management

• I’ll tell you the story of how it appeared, so as to help

you understand what it is

(35)

• The Web and e-commerce were gaining traction

• Their challenge:

how to scale

?

– 1996 to 1997: eBay grew from 41,000 to 341,000 users!

(36)

Pre-1995 Answer:

Big, Expensive

Computers

• Example: eBay used Sun E-10000 “supermini”

– 64 processors @250MHz, 64GB RAM, 20TB disk,

~$1M

• The good:

– Easy to manage

– Easy to program

– Simple failure mode

• The bad:

– Q: Any ideas?

(37)

Pre-1995 Answer:

Big, Expensive

Computers

• Example: eBay used Sun E-10000 “supermini”

– 64 processors @250MHz, 64GB RAM, 20TB disk,

~$1M

• The good:

– Easy to manage

– Easy to program

– Simple failure mode

• The bad:

(38)

1995: Berkeley Network of Workstations

(NOW)

• Idea: Leverage many interconnected small, cheap,

general-purpose machines for incremental scalability

and reliability

– Typical PC: 200 MHz CPU, 32MB RAM, 4GB disk

(39)

NOW-0

(40)

NOW-1

• 1995: NOW had 32 Sun SPARC stations

(41)

NOW-2

• 1997: 60 Sun SPARC-2’s

• Build Inktomi app

(42)

Companies Adopt NOW

• Everybody builds their own clusters and grows them to

handle more and more load

– Examples: eBay, Amazon, Google, all .com bubble

companies

• Similar to early days of electricity when everyone built

their own generator

Q: What do you think happened next?

(43)

Late 1990s: The Manageability

Challenge

• Hard

to manage and program large clusters

– How to write

scalable distributed programs

?

– How to

debug large-scale programs

?

– How to make services

reliable

?

– How to architect the

network infrastructure

?

– How to provision a

cluster to handle peak load

?

– How to

administer

a huge number of computers?

– …

(44)

Early 2000s: Scalable Cluster Primitives

• Very few technically strong companies create powerful

scalable and reliable

primitives for cluster management

and programming

• Examples:

– Google’s Map/Reduce

– The Google File System (GFS)

– Google’s Bigtable

– Amazon’s Dynamo

– Distributed debugging and tracing tools

– Datacenter temperature regulators

– Scalable distributed communication mechanisms

(45)

Mid 2000s: Three Valuable Commodities

• Giant-scale clusters with enormous

excess capacity

– Everybody provisioned for

peak

Q: How big was a typical Google datacenter around 2005?

a. 1,000 machines

b. 5,000 machines

c. 10,000 machines

d. 50,000 machines

e. 100,000 machines

(46)

Mid 2000s: Three Valuable Commodities

• Giant-scale clusters with enormous

excess capacity

– Everybody provisioned for

peak

• Expertise

for managing and operating clusters

at low

cost

– “Economies of scale”

• Complex software

to help program/manage clusters

– Even full applications (e.g., Gmail, Google Calendar, etc.)

Q: What do you think happened next?

(47)

2006: Cloud computing

• AWS sells resources, expertise, and access to cloud

primitives in a

pay-for-what-you-use

model

– Resources: CPU, network bandwidth, persistent storage

– Cloud primitives: Amazon S3, EC2, SQS, Map/Reduce, ...

• Google launches

Google Apps for Your Domain

– Customizable Gmail, Google Docs, Google Calendar

under a custom domain (e.g.,

gmail.cs.columbia.edu

)

• Google then launches the

App Engine

– Web hosting infrastructure (such infrastructures existed

before, but never enjoyed popularity)

(48)

Advantages of cloud computing

• Low barrier of market entry for

startups

• Cheaper

, low-management email, calendars, CRM

solutions

• New mobile

applications

• Faster

batch processing

via parallelization

(49)

Next time

• Communication:

remote procedure calls

• Remember to look on website for Lab 0 tomorrow!

– Start coding soon

– Decide whether to take the class based on how easy

you find this assignment

References

Related documents

Title A single-center analysis of the survival benefits of adjuvant gemcitabine chemotherapy for biliary tract

Focus on e-Inclusion for Programme Implementation Strategic 1 Initiative Prioritise programmes that create e-inclusion to target groups through tripartite collaborations

A variant featured in the Knights of the Old Republic video game, the Baragwin Assault Blade combines a lightsaber-resistant blade with technology similar to that of a vibro-weapon

(Research &amp; Policy Brief). Portland, ME: University of Southern Maine, Muskie School of Public Service, Maine Rural Health Research Center.. areas), and are more likely to

For this we needed to implement a remote event admin implementation, which uses service proxies for remote event handler services and thus uses OSGi remote services for

a DIRTY TRUTH FILMS production a DEANTE YOUNG concept DEANTE YOUNG stars in a DATE with DEANTE: LOOSE AND LOADED written by DEANTE YOUNG.. executive producers DEANTE YOUNG

A basic valve symbol is e symbol is compo composed of sed of one or one or more more envelopes with lines inside the envelope to represent flow envelopes with

But, as Judge Moore noted in dissenting from United States v. Ellison, the Sixth Circuit decision rejecting the right to privacy against the suspicionless use of license