Sherpa: Cloud Computing of the Third Kind

(1)

Sherpa:

Cloud Computing of the Third Kind

Raghu Ramakrishnan

(2)

What’s in a Name?

• Data Intensive Super Scalable Computing

• Grid Computing

• Super Computing

• Cloud Computing

• Parallel Database Management Systems

• Distributed Database Management Systems

Workload,

Programming model,

Ownership model,

Architectural trade-offs

Vary across:

(3)

Cloud Computing: Computing as a Service

Cloud Computing

CPU Intensive

Data Intensive

Analytic

E.g., SSDS,

Hadoop

Packaged

Software

High-throughput

E.g., Condor

“Transactional”

Storage & Serving

(4)

Trivia Question

• What’s the world’s most widely used parallel

programming language?

(5)

Why Not Use an RDBMS for “Analytics”?

• RDBMS provides too much

– ACID transactions

– Complex query language

– Lots and lots of knobs to turn

• RDBMS provides too little

– Lots of optimization and tuning possible for analytics

• E.g., Column stores, bit-map indexes

– Flexible programming model

• E.g., Group By vs. Map-Reduce; multi-dimensional OLAP

• But many good ideas to borrow!

– Declarative language; parallelization and optimization

techniques; value of data consistency …

(6)

Why Not Use an RDBMS for “OLTP”?

• RDBMS provides too much

– ACID transactions

– Complex query language

– Lots and lots of knobs to turn

• RDBMS provides too little

– Lack of (cost-effective) scalability, availability

– Not enough schema/data type flexibility

• RDBMS and Sherpa aim for different parts of the space

– RDBMS: Heavyweight, strongly consistent OLTP

– Sherpa: Lightweight but massive scale, relaxed

(7)

“I want a big, virtual database”

“What I want is a robust, high performance virtual

relational database that runs transparently over a

cluster, nodes dropping in and out of service at

will, read-write replication and data migration all

done automatically.

I want to be able to install a database on a server

cloud and use it like it was all running on one

machine.”

-- Greg Linden’s blog, 2006

(8)

An Example Web App

uploads

_“flower”

as

»

Friend activity

»

Your Photos

Sonja uploaded

Brandon tagged a photo

»

Photos tagged as “flower”

Updates

Queries

(9)

Why Hosted?

simple

API

Rapid application development

On-demand scaling

DBA functions amortized across applications

Rapid application development

On-demand scaling

(10)

Rapid Application Development

• What does it take to get the Next Great Thing off the

ground?

• Now:

– Set up multiple replicas of a clustered data store

– Set up a system for indexing

– Set up a system for caching

– Set up auxiliary DBMS instances for reporting, etc.

– Set up the feeds and messaging between them

– Write the application logic

– Fairly complex system at first line of new code

• Our vision:

– Write the application logic

– Use a hosted infrastructure to store and query your data

– Or, as Joshua Shachter puts it: “The next cool thing shouldn’t take a team

of 30, it should be three guys, PHP and a long weekend”

(11)

Implications

• Data management as a service

– Scientists and others who’ve resisted (installing, maintaining, and) using

DBMSs will find it much easier to reap the benefits

– “Data centers” and “Computing Centers” will come into vogue again

• The Web is becoming open

– E.g., OpenSocial, OpenID

• Hosted back-ends and RAD tools will make Web application

development accessible to all

– Ideas will be the most valuable currency, not the wherewithal to build

complex systems

• Paradigm shifts possible for how we do research in many fields:

– Build applications that embed your algorithms and test them directly in

the field—Computer Scientists can interact directly with users (ironically,

this would still be a breakthrough of sorts after four decades!)

– Many other disciplines (e.g., Sociology, microeconomics) can design and

conduct online experiments involving unprecedented numbers of

(12)

PNUTS: DB in the Cloud

E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E CREATE TABLE Parts (

ID VARCHAR, StockNumber INT, Status VARCHAR …

)

CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database

Parallel database Geographic replicationGeographic replication Indexes and views

Indexes and views

Structured, flexible schema

Hosted, managed infrastructure

(13)

Sherpa Data Services

PNUTS Services

• Query planning and execution • Index maintenance

Distributed infrastructure for tabular data

• Data partitioning • Update consistency • Replication YDOT FS • Ordered tables Applications YMB • Pub/sub messaging YCA: Authorization YDHT FS • Hash tables Zookeeper • Consistency service

(14)

Guiding Principles for PNUTS

Reliable and robust storage

– Replication for fault tolerance

– Predictable consistency guarantees

Simple to use

– Simple operations set

– Minimal client configuration

– Service-level authentication

– Flexible schemas

Highly Scalable / Performant

– Partitioning data over many

machines

– Horizontal scaling at every level

– Data is local to its usage

– Predictable performance via quality

of service levels

– Predicates evaluated on back end

– Cheaper consistency guarantees

than full ACID

Multiple rich access methods

– Hash and ordered table types

– System-maintained secondary

indexes

– Optimization for complex access

patterns

Rapid provisioning of new storage

– Simple, automated cluster growth

– Cheap table creation

– Pay as you grow, grow big as you

need

Operationally cheap

– Automated failover

– Automated load balancing

– No single points of failure

(15)

Data Model and Retrieval

YDOT/YDHT

• Data model:

Key

→

value

dictionary

– Value can be packed with multiple

attributes

• YDHT operations:

Hash table calls

– Get

– Set (insert and update)

– Remove

– Scan

• YDOT: YDHT + ordered ranges

PNUTS

• Data model:

Relational tables with

flexible schema

– Typed, declared attributes

– Fast addition of new attributes

• Operations:

PNUTS query

language

– Point lookup – Range queries – Insert/Update/Remove – Complex predicates – Ordering – Top-K

• Primary API is web services (JSON over HTTP)

(16)

Storage servers

Routers

Clients

Tablet

Controller

YDHT

• Scalable distributed record store

– Optimized for small reads and writes

– Focus on ease of operations, multi-region redundancy, organic

scalability

(17)

Ways to use YDHT

• As a primary store

• As a materialized view/cache

• As part of PNUTS!

YDHT YDHT APP Primary store APP YDHT APP PNUTS

(18)

Data Concepts—YDHT

Table

Apple Lemon Grape Orange Lime Strawberry Kiwi Avocado Tomato Banana

Primary key

Record

Grapes are good to eat Limes are green

Apple is wisdom Strawberry shortcake Arrgh! Don’t get scurvy!

But at what price?

How much did you pay for this lemon? Is this a vegetable?

New Zealand

Tablet

(19)

Data Concepts—YDOT

Apple Banana Grape Orange Lime Strawberry Kiwi Avocado Tomato Lemon

Ordered by primary key

Grapes are good to eat

Limes are green Apple is wisdom

Strawberry shortcake Arrgh! Don’t get scurvy! But at what price?

The perfect fruit

Is this a vegetable?

How much did you pay for this lemon? New Zealand

Tablets contain

clustered

(20)

Storage unit 1 Storage unit 2 Storage unit 3

YDOT—Ordered Table Store

• YDOT provides clustered, ordered retrieval of records

Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Router Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon

Grapefruit…Pear?

Grapefruit…Lime? Lime…Pear? Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1

(21)

Data Concepts—PNUTS

Apple Banana Grape Orange Lime Strawberry Kiwi Avocado Tomato Lemon

Grapes are good to eat

Limes are green Apple is wisdom

Strawberry shortcake Arrgh! Don’t get scurvy! But at what price?

The perfect fruit

Is this a vegetable?

How much did you pay for this lemon? New Zealand $1 $3 $2 $12 $8 $1 $9 $2 $900 $14

Schema:

declared, typed

fields

Name Description Price

Retains tablet

structure of

YDHT/YDOT

(22)

Flexible Schema

Posted date

Listing id

Item

Price

6/1/07

424252

Couch

$570

6/1/07

763245

Bike

$86

6/3/07

211242

Car

$1123

6/5/07

421133

Lamp

$15

Primary table

Color

Red

Condition

Good

Fair

(23)

(24)

Mastering

A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E _{A 42342 E} B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E Tablet master

(25)

Basic Consistency Model

Goal:

• Make it easier for applications to reason about updates and cope with

asynchrony—alternative to “transactions” in an asynchronous world

• What happens to a record with primary key “Brian”?

Guarantees:

• Every reader will always see some consistent, but possibly stale version

• Readers can request a more up-to-date version, but may pay extra latency

– Special case: Critical read (writer/readers see their own writes)

• Writers can verify that the record is still at the version they expect

Time

Record

inserted Update Update Delete

v. 1 v. 2 v. 3 Generation 1 Record inserted Update Update Delete v. 1 v. 2 v. 4 Generation 2 Update v. 3 Record inserted Delete v. 1 Generation 3

(26)

Server 1 Server 2 Server 3 Server 4 Bike $86 6/2/07 636353 Chair $10 6/5/07 662113

Distribution

Couch $570 6/1/07 424252 Car $1123 6/1/07 256623 Lamp $19 6/7/07 121113 Bike $56 6/9/07 887734 Scooter $18 6/11/07 252111 Hammer $8000 6/11/07 116458

Distribution for parallelism

Data shuffling for load balancing

(27)

Tablet Splitting and Balancing

Each storage unit has many tablets

Tablets may grow over time

Overfull tablets split

Storage unit may become a hotspot

Shed load by moving tablets to other servers

(28)

Data-path components

Each can be scaled horizontally

Data-path components Each can be scaled horizontally

Cluster 1 Cluster 2 Storage units Routers Tablet controller Tablet map Load balancer Server monitor WS API SU API Clients YMB

Architecture

Query processing

(29)

Yahoo! Message Broker (YMB)

• Pub/sub based on reliable logging

– Topic-based

– Persistent subscriptions

– Multi-region presence

• Guarantees

– In the presence of at most one YMB machine failure:

• Published messages will be delivered on live subscriptions system-wide

• Messages published in one region will be delivered to all subscribers in the order they were published (partial order)

• Published messages available for re-delivery until subscriber calls consume()

– If there are two machine failures:

• Subscribers will be notified of “broken subscription”

– Since messages may have been lost

• Uses in YDHT/PNUTS

– Reliably replicate data and updates between regions

– Reliably communicate coordination/synchronization message between

distributed actors

(30)

Quality of Service

• Hosted platform supporting multiple applications

– And eventually,

multi-tenancy

!

• Inter-application isolation

– Applications run on “leased” servers

• Performance is as good as those servers give you

• Unaffected by other applications

– Some shared infrastructure

• Overprovisioned to ensure performance agreements

• Intra-application isolation

– How to share my data without hurting my app’s

performance?

(31)

BigTable

• BigTable overview

– Rows and columns abstraction with flexible schemas

and data versioning, range scans

– Built on top of GFS

• Things BigTable emphasizes that we don’t (for now,

anyway)

– Keeping multiple versions

– Tight integration with MapReduce

• Things we emphasize that BigTable doesn’t

– Asynchrony

– Geographic replication

– Indexing

(32)

Dynamo

• Dynamo overview

– Highly write available data store

– Uses gossip and eventual consistency: can write

anywhere, eventually update will propagate to all replicas

• PNUTS versus Dynamo

– Dynamo is a hash table; PNUTS is both hashed and

ordered

– Eventual consistency model exposes “dirty” data

– PNUTS can operate in high availability

or

high

consistency mode

– Gossip is not tuned for geographic replication

– No record structure or indexes in Dynamo

(33)

Sherpa: Cloud Computing of the Third Kind

Sherpa:

Cloud Computing of the Third Kind

Raghu Ramakrishnan

What’s in a Name?

• Data Intensive Super Scalable Computing

• Grid Computing

• Super Computing

• Cloud Computing

• Parallel Database Management Systems

• Distributed Database Management Systems

Workload,

Programming model,

Ownership model,

Architectural trade-offs

Vary across:

Cloud Computing: Computing as a Service

Cloud Computing

CPU Intensive

Data Intensive

Analytic

E.g., SSDS,

Hadoop

Packaged

Software

High-throughput

E.g., Condor

“Transactional”

Storage & Serving

Trivia Question

• What’s the world’s most widely used parallel

programming language?

Why Not Use an RDBMS for “Analytics”?

• RDBMS provides too much

– ACID transactions

– Complex query language

– Lots and lots of knobs to turn

• RDBMS provides too little

– Lots of optimization and tuning possible for analytics

• E.g., Column stores, bit-map indexes

– Flexible programming model

• E.g., Group By vs. Map-Reduce; multi-dimensional OLAP

• But many good ideas to borrow!

– Declarative language; parallelization and optimization

techniques; value of data consistency …

Why Not Use an RDBMS for “OLTP”?

• RDBMS provides too much

– ACID transactions

– Complex query language

– Lots and lots of knobs to turn

• RDBMS provides too little

– Lack of (cost-effective) scalability, availability

– Not enough schema/data type flexibility

• RDBMS and Sherpa aim for different parts of the space

– RDBMS: Heavyweight, strongly consistent OLTP

– Sherpa: Lightweight but massive scale, relaxed

“I want a big, virtual database”

“What I want is a robust, high performance virtual

relational database that runs transparently over a

cluster, nodes dropping in and out of service at

will, read-write replication and data migration all

done automatically.

I want to be able to install a database on a server

cloud and use it like it was all running on one

machine.”

-- Greg Linden’s blog, 2006

An Example Web App

uploads

tags

“flower”

as

»

Friend activity

»

Your Photos

Sonja uploaded

Brandon tagged a photo

»

Photos tagged as “flower”

Updates

_“flower”