• No results found

An Open Source Memory-Centric Distributed Storage System

N/A
N/A
Protected

Academic year: 2021

Share "An Open Source Memory-Centric Distributed Storage System"

Copied!
55
0
0

Loading.... (view fulltext now)

Full text

(1)

Haoyuan Li, Tachyon Nexus


[email protected]

September 30, 2015 @ Strata and Hadoop World NYC 2015

An Open Source Memory-Centric

Distributed Storage System

(2)

Outline

•  Open Source

•  Introduction to Tachyon

•  New Features

(3)

Outline

•  Open Source

•  Introduction to Tachyon

•  New Features

(4)

History

•  Started at UC Berkeley AMPLab

–  From summer 2012

–  Same lab produced Apache Spark and Apache Mesos

•  Open sourced

–  April 2013

–  Apache License 2.0

–  Latest Release: Version 0.7.1 (August 2015)

(5)

Contributors Growth

v0.4! Feb ‘14 v0.3! Oct ‘13 v0.2 Apr ‘13 v0.1

Dec ‘12 Jul ‘14v0.5! Mar ‘15v0.6! Jul ‘15v0.7!

1 3

15

30

46

70

(6)

Contributors Growth

>

150

Contributors

(

3x

increment over the last Strata NYC)

(7)

Contributors Growth

One of the

Fastest

Growing Big Data

Open Source

(8)

Thanks to Contributors and Users!

(9)

One Tachyon Production

Deployment Example

•  Baidu (Dominant Search Engine in China,

~ 50 Billion USD Market Cap)

•  Framework: SparkSQL

•  Under Storage: Baidu’s File System

•  Storage Media: MEM + HDD

•  100+ nodes deployment

•  1PB+ managed space

(10)

Outline

•  Open Source

•  Introduction to Tachyon

•  New Features

(11)

Tachyon

is an

Open Source

Memory-centric

Distributed

(12)
(13)

Performance Trend:

Memory is

Fast

•  RAM throughput 


increasing exponentially •  Disk throughput increasing slowly

(14)

Price Trend: Memory is

Cheaper

(15)
(16)

Is the

(17)

Missing a Solution

(18)

A Use Case Example with -

•  Fast, in-memory data processing framework

– Keep one in-memory copy inside JVM

– Track lineage of operations used to derive data

– Upon failure, use lineage to recompute data

map

filter map

join reduce Lineage Tracking

(19)

Issue 1

Data Sharing is the bottleneck in

analytics pipeline:

Slow writes to disk

Spark Job1 Spark mem block manager block 1 block 3 Spark Job2 Spark mem block manager block 3 block 1

HDFS / Amazon S3

block 1

block 3

block 2

block 4

storage engine &

execution engine

same process

(20)

Issue 1

Spark Job Spark mem block manager block 1 block 3

Hadoop MR Job

YARN

HDFS / Amazon S3

block 1

block 3

block 2

block 4

Data Sharing is the bottleneck in

analytics pipeline:

Slow writes to disk

storage engine &

execution engine

same process

(21)

Issue 1 resolved with Tachyon

Memory-speed data sharing

among jobs in di

erent

frameworks

execution engine & 


storage engine

same process

(fast writes)

Spark Job Spark mem

Hadoop MR Job YARN

HDFS / Amazon S3

block 1 block 2

HDFS   disk  

block  1   block  3  

block  2   block  4  

Tachyon!

in-memory block 1

(22)

Issue 2

Spark Task Spark memory block manager block 1 block 3

HDFS / Amazon S3

block 1

block 3

block 2

block 4

execution engine & 


storage engine

same process

Cache loss when process

(23)

Issue 2

crash Spark memory block manager block 1 block 3

HDFS / Amazon S3

block 1

block 3

block 2

block 4

execution engine & 


storage engine

same process

Cache loss when process

(24)

HDFS / Amazon S3

Issue 2

block 1 block 3 block 2 block 4

execution engine & 


storage engine

same process crash

Cache loss when process

(25)

HDFS / Amazon S3 block 1 block 3 block 2 block 4 Tachyon! in-memory block 1

block 3 block 4

Issue 2 resolved with Tachyon

Spark Task Spark memory

block manager

execution engine & 


storage engine

same process

Keep in-memory data safe,

(26)

Issue 2 resolved with Tachyon

HDFS   disk  

block  1   block  3  

block  2   block  4  

execution engine & 


storage engine

same process

Tachyon!

in-memory block 1

block 3 block 4

crash

HDFS / Amazon S3

block 1 block 2

Keep in-memory data safe,

(27)

HDFS / Amazon S3

Issue 3

In-memory Data Duplication &

Java Garbage Collection

Spark Job1 Spark mem block manager block 1 block 3 Spark Job2 Spark mem block manager block 3 block 1 block 1 block 3 block 2 block 4

execution engine & 


storage engine

same process

(28)

Issue 3 resolved with Tachyon

No in-memory data duplication,

much less GC

Spark Job1 Spark mem

Spark Job2 Spark mem

HDFS / Amazon S3

block 1

block 3

block 2

block 4

execution engine & 


storage engine

same process

(no duplication & GC)

HDFS   disk  

block  1   block  3  

block  2   block  4  

Tachyon!

in-memory block 1

(29)

Previously Mentioned

•  A memory-centric storage architecture

(30)
(31)
(32)
(33)

Outline

•  Open Source

•  Introduction to Tachyon

•  New Features

(34)

1) Eco-system:

Enable new workload in any storage;

(35)

2) Tachyon running in

production environment

,

both

(36)

Use Case: Baidu

•  Framework: SparkSQL

•  Under Storage: Baidu’s File System

•  Storage Media: MEM + HDD

•  100+ nodes deployment

•  1PB+ managed space

(37)

Use Case: a SAAS Company

•  Framework: Impala

•  Under Storage: S3

•  Storage Media: MEM + SSD

(38)

Use Case: an Oil Company

•  Framework: Spark

•  Under Storage: GlusterFS

•  Storage Media: MEM only

(39)

Use Case: a SAAS Company

•  Framework: Spark

•  Under Storage: S3

•  Storage Media: SSD only

(40)

What if

data size exceeds

(41)

3) Tiered Storage:


Tachyon Manages More Than DRAM

MEM

SSD

HDD

Faster

Higher 


(42)

Configurable Storage Tiers

MEM only

MEM + HHD

(43)

4) Pluggable Data Management Policy

Evict stale data to lower tier

Promote hot data to upper tier

(44)
(45)
(46)
(47)

More Features

•  7) Remote Write Support

•  8) Easy deployment with Mesos and Yarn

•  9) Initial Security Support

•  10) One Command Cluster Deployment

•  11) Metrics Reporting for Clients, Workers,

(48)
(49)
(50)

Outline

•  Open Source

•  Introduction to Tachyon

•  New Features

(51)

Memory-Centric Distributed Storage

(52)

•  Team consists of Tachyon creators, top contributors

•  Series A ($7.5 million) from Andreessen Horowitz


•  Committed to Tachyon Open Source


(53)
(54)

Strata NYC 2015

•  Welcome to visit us at our booth #P18.

•  Check out other Tachyon related talks.

–  First-ever scalable, distributed deep learning architecture

using Spark and Tachyon

•  Christopher Nguyen (Adatao, Inc.), Vu Pham (Adatao, Inc)

•  2:05pm–2:45pm Thursday, 10/01/2015

–  Faster time to insight using Spark, Tachyon, and Zeppelin

•  Nirmal Ranganathan (Rackspace Hosting)

(55)

•  Try Tachyon: http://tachyon-project.org


•  Develop Tachyon: https://github.com/amplab/tachyon


•  Meet Friends: http://www.meetup.com/Tachyon


•  Get News: http://goo.gl/mwB2sX

•  Tachyon Nexus: http://www.tachyonnexus.com

References

Related documents

We use parametrized and cascaded hardware so that the length of the patterns are not fixed. Therefore, the max- imum length of the input bytes that is used to generate the hashed

economic value in planting industry, the ratio of GDP and total water consumption, sewage waste water emissions and total trench water loss are all strongly correlated, and from

After classification is done, next is to apply rule based system method to know safety stock and ROP from each type of drug based on fast moving, moderate and slow moving

In California, although 31% of children with public insurance were reported as not having had preventive dental care in the past year, 17% of children with special health care

between individuals and changes in their social and cultural contexts, incorporating individual, familial, and community measures, represents a critical gap in our.. understanding

What We Do Research Education/ Support Advocacy Meso Foundation Awarded over $8.2 Million for Research since 2000 Advocate to raise awareness and influenced $8.8

На практичних заняттях керівник практики знайомить студентів із завданням діалектологічної експедиції, зі структурою, змістом та основними

Browse Full: " Global Consumer Electronics Market Growth, Industry Survey, Competitive Analysis, Segment Analysis, Regional Outlook, Application, Company Profiles, Trends, Size,