• No results found

Big Data Technology CS , Technion, Spring 2014

N/A
N/A
Protected

Academic year: 2021

Share "Big Data Technology CS , Technion, Spring 2014"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data Technology

CS 236620, Technion, Spring 2014

Edward Bortnikov & Ronny Lempel

Yahoo Labs, Haifa

(2)

Data = Systems

(3)
(4)

How to Get the Big Systems Right?

n 

A multidisciplinary science on its own right

n  Distributed Computing, Networking

n  Hardware and Software Architecture

n  Operations Research, Measurement, Performance Evaluation

n  Power Management

n  … and even Civil Engineering

n 

In this course - aspects related to Computer Science

n 

We’ll start with some principles …

(5)

An Ideal System Should …

(6)
(7)
(8)

                     Architect’s  Dream  -­‐  Throughput  

How many requests can be served in a unit of time?

(9)

Architect’s  Dream  -­‐  Latency  

How long does a single request take?

(10)

Scaling  Up?  Scaling  Out?  

Scale up

(11)

Example:  Network  Filesystems  

NFS  server  

“server:/a/b/z.txt”  

/users/bob/courses/CS101.txt  <server_123,    block  20>”    

 

R/W  request  

Metadata   service   (namenode)  

Data  service   (datanode)  

Monolithic

(e.g., historical NFS)

Distributed

(12)

Scale-­‐Out  Philosophy  

§ 

Scalability through Decoupling

§ 

Whatever is split can be scaled independently

§ 

HDFS: Metadata and Data accesses decoupled

§ 

Minimize centralized processing

§ 

Metadata accesses coordinated but lean

§ 

Maximize I/O parallelism

§ 

Clients access the data nodes concurrently

(13)

The  Peer-­‐to-­‐Peer  Approach  

§ 

Completely server-less

§ 

All nodes and functions are

fully symmetric

§ 

E.g., in a distributed data store every node has a

serving function and a management function

§ 

Less favored in managed DC environments

§ 

Very hard to maintain consistency guarantees

§ 

Very hard to optimize globally

(14)

An Ideal System Should …

(15)
(16)
(17)

The  Tail  at  Scale  

§ 

Problems are aggravated in large systems

§ 

Component-level variability amplified by scale

§ 

Failures and slow components are part of normal life,

not an exception

§ 

Two ways of addressing service variability

§ 

Prevent

bad things from happening by detecting and

isolating the slow/flawed components

§ 

Contain

bad things through redundancy

(18)
(19)

An Ideal System Should …

(20)

Expected Workload Matters

n 

Latency-oriented

n 

Interactive, user-facing systems

n 

Example: Web search serving

n 

Throughput-oriented

n 

Back-end heavyweights

(21)

Data Accessibility Matters

Stream

Warehouse

(22)

Access Patterns Matter

n 

Data Analytics

n  Throughput-oriented applications

n  Write-once (typically, append)

n  Read-many (typically, large sequential reads)

n 

Online Transaction Processing (OLTP)

n  Latency-oriented applications

n  Write-intensive

n  Typically, many small direct accesses

(23)

Hardware Constraints Matter

(24)

Compute- or Data-Intensive?

Storage

Compute

(25)

Locality Matters

n 

Can computation and storage be aligned?

n 

Optimization?

n 

How repetitive is the workload?

n 

Optimization?

α

>

X

X

x

)

~

Pr(

Dominant Items Long tail Power-law distribution

(26)

Consistency  MaZers  

§ 

Stricter properties = stronger consistency

§ 

Are you prepared to handle weird stuff?

§ 

Fancy stock alerts

§ 

Is it okay to lose an event once in a while?

§ 

Fancy a social network

§ 

Bob deletes photos with his ex-date Alice

§ 

Bob befriends Carol

(27)

A Dialogue in the Wild

Engineer: we afraid of any kind of synchronization

Scientist: what kind of guarantee do you want to get?

Engineer: let’s build something simple Relax your consistency models

We want the systems to be eventually consistent

Scientist: this is an interesting problem

(28)

Example: Amazon’s Outage

Weak consistency models can lead

(29)
(30)

Elasticity Matters

n 

Resource demands often unknown in advance

n 

Driven by application popularity

n 

Goal: enablement of organic growth

n 

Add- (and pay-) as-you-grow

n 

Economies of scale

n 

Pool multiple datasets and services in huge DC’s

n 

Better use of shared resources (personnel, real

(31)

Cloud Computing

n 

Computing resources delivered

over a network

n 

Infrastructure issues

abstracted away

n 

***-as-a-Service

(32)
(33)

Designing the Air Flows

(34)

Power Efficiency - Surprising Facts

n  “At Facebook's Prineville, OR, facility, ambient air flows into the

building, passing first through a series of filters to remove bugs, dust, and other contaminants.”

n  “Previous estimates suggested that electricity consumption in

massive server farms would double between 2005 and 2010. Instead, the number rose by 56% worldwide, and merely 36% in the US.”

n  “The most efficient data centers now hover at temperatures

closer to 80 degrees Fahrenheit, and instead of sweaters, the technicians walk around in shorts.”

(35)

Summary

n 

Design for scale

n 

Design for fault-tolerance

n 

Know what you design for

(36)

Further Reading

n 

Lessons of Scale at Facebook

References

Related documents

Figure 2 — The correlation between serum triglyceride (TG) levels after CAPD treatment (Post-D TG, mg/dL) and serum insulin levels ( µ U/mL) at fasting (a), 1 hour (b), and 2 hours

All students admitted to the Medical Assistant Associate of Applied Science Degree Completion Program will complete the following six general education and three

When sugar dust and sugar escaped the equipment in the packing buildings, timely housekeeping activities should have been performed to remove accumulations from elevated

Having evaluated both products, these customers inevitably reach the same conclusion: Proofpoint’s solution is technically superior; the McAfee Email Gateway is a legacy product

According to a recent Ventana Research report on big data technology, only 22% of 163 organizations that Ventana polled last year were using Hadoop, and 45% said they had no plans

This is important because local contexts can differ quite markedly in various Indigenous societies, leading Māori scholar Linda Tuhiwai Smith (1999: 6) to note that

Green synthesis of silver nanoparticles using olive leaf extract and its antibacterial activitya.

Abstract In this paper the well-known minimax theorems of Wald, Ville and Von Neumann are generalized under weaker topological conditions on the payoff function ƒ and/or extended