• No results found

Sherpa: Cloud Computing of the Third Kind

N/A
N/A
Protected

Academic year: 2021

Share "Sherpa: Cloud Computing of the Third Kind"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Sherpa:

Cloud Computing of the Third Kind

Raghu Ramakrishnan

(2)

What’s in a Name?

• Data Intensive Super Scalable Computing

• Grid Computing

• Super Computing

• Cloud Computing

• Parallel Database Management Systems

• Distributed Database Management Systems

Workload,

Programming model,

Ownership model,

Architectural trade-offs

Vary across:

(3)

Cloud Computing: Computing as a Service

Cloud Computing

CPU Intensive

Data Intensive

Analytic

E.g., SSDS,

Hadoop

Packaged

Software

High-throughput

E.g., Condor

“Transactional”

Storage & Serving

(4)

Trivia Question

• What’s the world’s most widely used parallel

programming language?

(5)

Why Not Use an RDBMS for “Analytics”?

• RDBMS provides too much

– ACID transactions

– Complex query language

– Lots and lots of knobs to turn

• RDBMS provides too little

– Lots of optimization and tuning possible for analytics

• E.g., Column stores, bit-map indexes

– Flexible programming model

• E.g., Group By vs. Map-Reduce; multi-dimensional OLAP

• But many good ideas to borrow!

– Declarative language; parallelization and optimization

techniques; value of data consistency …

(6)

Why Not Use an RDBMS for “OLTP”?

• RDBMS provides too much

– ACID transactions

– Complex query language

– Lots and lots of knobs to turn

• RDBMS provides too little

– Lack of (cost-effective) scalability, availability

– Not enough schema/data type flexibility

• RDBMS and Sherpa aim for different parts of the space

– RDBMS: Heavyweight, strongly consistent OLTP

– Sherpa: Lightweight but massive scale, relaxed

(7)

“I want a big, virtual database”

“What I want is a robust, high performance virtual

relational database that runs transparently over a

cluster, nodes dropping in and out of service at

will, read-write replication and data migration all

done automatically.

I want to be able to install a database on a server

cloud and use it like it was all running on one

machine.”

-- Greg Linden’s blog, 2006

(8)

An Example Web App

uploads

tags

“flower”

as

»

Friend activity

»

Your Photos

Sonja uploaded

Brandon tagged a photo

»

Photos tagged as “flower”

Updates

Queries

(9)

Why Hosted?

simple

API

ƒ

Rapid application development

ƒ

On-demand scaling

ƒ

DBA functions amortized across applications

ƒ

Rapid application development

ƒ

On-demand scaling

(10)

Rapid Application Development

• What does it take to get the Next Great Thing off the

ground?

Now:

– Set up multiple replicas of a clustered data store

– Set up a system for indexing

– Set up a system for caching

– Set up auxiliary DBMS instances for reporting, etc.

– Set up the feeds and messaging between them

– Write the application logic

– Fairly complex system at first line of new code

Our vision:

– Write the application logic

– Use a hosted infrastructure to store and query your data

– Or, as Joshua Shachter puts it: “The next cool thing shouldn’t take a team

of 30, it should be three guys, PHP and a long weekend”

(11)

Implications

Data management as a service

– Scientists and others who’ve resisted (installing, maintaining, and) using

DBMSs will find it much easier to reap the benefits

– “Data centers” and “Computing Centers” will come into vogue again

The Web is becoming open

– E.g., OpenSocial, OpenID

Hosted back-ends and RAD tools will make Web application

development accessible to all

– Ideas will be the most valuable currency, not the wherewithal to build

complex systems

Paradigm shifts possible for how we do research in many fields:

– Build applications that embed your algorithms and test them directly in

the field—Computer Scientists can interact directly with users (ironically,

this would still be a breakthrough of sorts after four decades!)

– Many other disciplines (e.g., Sociology, microeconomics) can design and

conduct online experiments involving unprecedented numbers of

(12)

PNUTS: DB in the Cloud

E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E CREATE TABLE Parts (

ID VARCHAR, StockNumber INT, Status VARCHAR …

)

CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database

Parallel database Geographic replicationGeographic replication Indexes and views

Indexes and views

Structured, flexible schema

Structured, flexible schema

Hosted, managed infrastructure

(13)

Sherpa Data Services

PNUTS Services

• Query planning and execution • Index maintenance

Distributed infrastructure for tabular data

• Data partitioning • Update consistency • Replication YDOT FS • Ordered tables Applications YMB • Pub/sub messaging YCA: Authorization YDHT FS • Hash tables Zookeeper • Consistency service

(14)

Guiding Principles for PNUTS

Reliable and robust storage

Replication for fault tolerance

Predictable consistency guarantees

Simple to use

Simple operations set

Minimal client configuration

Service-level authentication

Flexible schemas

Highly Scalable / Performant

Partitioning data over many

machines

Horizontal scaling at every level

Data is local to its usage

Predictable performance via quality

of service levels

Predicates evaluated on back end

Cheaper consistency guarantees

than full ACID

Multiple rich access methods

Hash and ordered table types

System-maintained secondary

indexes

Optimization for complex access

patterns

Rapid provisioning of new storage

Simple, automated cluster growth

Cheap table creation

Pay as you grow, grow big as you

need

Operationally cheap

Automated failover

Automated load balancing

No single points of failure

(15)

Data Model and Retrieval

YDOT/YDHT

Data model:

Key

value

dictionary

Value can be packed with multiple

attributes

YDHT operations:

Hash table calls

Get

Set (insert and update)

Remove

Scan

YDOT: YDHT + ordered ranges

PNUTS

Data model:

Relational tables with

flexible schema

Typed, declared attributes

Fast addition of new attributes

Operations:

PNUTS query

language

Point lookupRange queriesInsert/Update/RemoveComplex predicatesOrderingTop-K

Primary API is web services (JSON over HTTP)

(16)

Storage servers

Routers

Clients

Tablet

Controller

YDHT

• Scalable distributed record store

– Optimized for small reads and writes

– Focus on ease of operations, multi-region redundancy, organic

scalability

(17)

Ways to use YDHT

• As a primary store

• As a materialized view/cache

• As part of PNUTS!

YDHT YDHT APP Primary store APP YDHT APP PNUTS

(18)

Data Concepts—YDHT

Table

Apple Lemon Grape Orange Lime Strawberry Kiwi Avocado Tomato Banana

Primary key

Record

Grapes are good to eat Limes are green

Apple is wisdom Strawberry shortcake Arrgh! Don’t get scurvy!

But at what price?

How much did you pay for this lemon? Is this a vegetable?

New Zealand

Tablet

(19)

Data Concepts—YDOT

Apple Banana Grape Orange Lime Strawberry Kiwi Avocado Tomato Lemon

Ordered by primary key

Grapes are good to eat

Limes are green Apple is wisdom

Strawberry shortcake Arrgh! Don’t get scurvy! But at what price?

The perfect fruit

Is this a vegetable?

How much did you pay for this lemon? New Zealand

Tablets contain

clustered

(20)

Storage unit 1 Storage unit 2 Storage unit 3

YDOT—Ordered Table Store

• YDOT provides clustered, ordered retrieval of records

Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Router Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon

Grapefruit…Pear?

Grapefruit…Lime? Lime…Pear? Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1

(21)

Data Concepts—PNUTS

Apple Banana Grape Orange Lime Strawberry Kiwi Avocado Tomato Lemon

Grapes are good to eat

Limes are green Apple is wisdom

Strawberry shortcake Arrgh! Don’t get scurvy! But at what price?

The perfect fruit

Is this a vegetable?

How much did you pay for this lemon? New Zealand $1 $3 $2 $12 $8 $1 $9 $2 $900 $14

Schema:

declared, typed

fields

Name Description Price

Retains tablet

structure of

YDHT/YDOT

(22)

Flexible Schema

Posted date

Listing id

Item

Price

6/1/07

424252

Couch

$570

6/1/07

763245

Bike

$86

6/3/07

211242

Car

$1123

6/5/07

421133

Lamp

$15

Primary table

Color

Red

Condition

Good

Fair

(23)
(24)

Mastering

A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E Tablet master

(25)

Basic Consistency Model

Goal:

Make it easier for applications to reason about updates and cope with

asynchrony—alternative to “transactions” in an asynchronous world

What happens to a record with primary key “Brian”?

Guarantees:

Every reader will always see some consistent, but possibly stale version

Readers can request a more up-to-date version, but may pay extra latency

– Special case: Critical read (writer/readers see their own writes)

Writers can verify that the record is still at the version they expect

Time

Record

inserted Update Update Delete

v. 1 v. 2 v. 3 Generation 1 Record inserted Update Update Delete v. 1 v. 2 v. 4 Generation 2 Update v. 3 Record inserted Delete v. 1 Generation 3

(26)

Server 1 Server 2 Server 3 Server 4 Bike $86 6/2/07 636353 Chair $10 6/5/07 662113

Distribution

Couch $570 6/1/07 424252 Car $1123 6/1/07 256623 Lamp $19 6/7/07 121113 Bike $56 6/9/07 887734 Scooter $18 6/11/07 252111 Hammer $8000 6/11/07 116458

Distribution for parallelism

Distribution for parallelism

Data shuffling for load balancing

Data shuffling for load balancing

(27)

Tablet Splitting and Balancing

Each storage unit has many tablets

Each storage unit has many tablets

Tablets may grow over time

Tablets may grow over time

Overfull tablets split

Overfull tablets split

Storage unit may become a hotspot

Storage unit may become a hotspot

Shed load by moving tablets to other servers

(28)

Data-path components

Each can be scaled horizontally

Data-path components Each can be scaled horizontally

Cluster 1 Cluster 2 Storage units Routers Tablet controller Tablet map Load balancer Server monitor WS API SU API Clients YMB

Architecture

Query processing

(29)

Yahoo! Message Broker (YMB)

Pub/sub based on reliable logging

– Topic-based

– Persistent subscriptions

– Multi-region presence

Guarantees

– In the presence of at most one YMB machine failure:

• Published messages will be delivered on live subscriptions system-wide

• Messages published in one region will be delivered to all subscribers in the order they were published (partial order)

• Published messages available for re-delivery until subscriber calls consume()

– If there are two machine failures:

• Subscribers will be notified of “broken subscription”

– Since messages may have been lost

Uses in YDHT/PNUTS

– Reliably replicate data and updates between regions

– Reliably communicate coordination/synchronization message between

distributed actors

(30)

Quality of Service

Hosted platform supporting multiple applications

– And eventually,

multi-tenancy

!

Inter-application isolation

– Applications run on “leased” servers

• Performance is as good as those servers give you

• Unaffected by other applications

– Some shared infrastructure

• Overprovisioned to ensure performance agreements

Intra-application isolation

– How to share my data without hurting my app’s

performance?

(31)

BigTable

• BigTable overview

– Rows and columns abstraction with flexible schemas

and data versioning, range scans

– Built on top of GFS

• Things BigTable emphasizes that we don’t (for now,

anyway)

– Keeping multiple versions

– Tight integration with MapReduce

• Things we emphasize that BigTable doesn’t

– Asynchrony

– Geographic replication

– Indexing

(32)

Dynamo

• Dynamo overview

– Highly write available data store

– Uses gossip and eventual consistency: can write

anywhere, eventually update will propagate to all replicas

• PNUTS versus Dynamo

– Dynamo is a hash table; PNUTS is both hashed and

ordered

– Eventual consistency model exposes “dirty” data

– PNUTS can operate in high availability

or

high

consistency mode

– Gossip is not tuned for geographic replication

– No record structure or indexes in Dynamo

(33)

Summary

• Hosted data management is a new frontier

– Beyond the issues we discussed, many novel aspects

that arise because of hosting (e.g., multi-tenancy)

– Paradigm shift that goes beyond the technology (e.g.,

new kinds of usage, new business models)

• Formulas for new research problem:

– Old research problem + fine-grained asynchrony

– Old research problem + hosted service model

• Formulas for solutions?

References

Related documents

Can you right of floyd talbert letter was a humbling to be missed mr winters death, shape the whole band of it was red and lived and the magazine.. Pushing and hanks, talbert letter

The punch and die seems to be the heart of the tool, but the die set consisting of top plate, bottom plate, guide pillar and bush stood more critical during machining and

T – On standing up, gravity causes venous pooling in the legs, which reduces the venous filling pressure at heart level (Figure 6.9) .This reduces diastolic ventricular distension

Term 1 Apple Apricot Banana Berries (strawberry, raspberry, blueberry) Melon Peaches Watermelon Term 2 Apple Banana Feijoa Kiwifruit Mandarin Orange Pear Frozen berries Term 3

Creating a Storage Unit Record by Transfer Order .... Displaying a Storage

Configuring a storage unit for cloud storage Create one or more storage units that reference the disk pool. The Disk Pool Configuration Wizard lets you create a storage unit;

Water Water N Components Components Amount of Insurance Construction Age Deductible Weather / Elevation Proximity Features Weather Non- weather Environmental Module Rating

If the Active unit loses connection to a storage device and the Standby unit continues to be connected, then the Standby unit takes control of all the storage devices and becomes