• No results found

Tachyon: Reliable File Sharing at Memory- Speed Across Cluster Frameworks

N/A
N/A
Protected

Academic year: 2021

Share "Tachyon: Reliable File Sharing at Memory- Speed Across Cluster Frameworks"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Tachyon: Reliable File Sharing at Memory-

Speed Across Cluster Frameworks

Haoyuan Li

UC Berkeley

(2)

Outline

Outline | Motivation| Design | Results| Status| Future

• Motivation

• System Design

• Evaluation Results

• Release Status

• Future Directions

(3)

Outline| Motivation | Design | Results| Status| Future

Memory is King

(4)

Memory Trend

Outline| Motivation | Design | Results| Status| Future

• RAM throughput increasing exponentially

(5)

Disk Trend

Outline| Motivation | Design | Results| Status| Future

• Disk throughput increasing slowly

(6)

Consequence

Outline| Motivation | Design | Results| Status| Future

• Memory locality key to achieve

– Interactive queries

– Fast query response

(7)

Current Big Data Eco-system

Outline| Motivation | Design | Results| Status| Future

• Many frameworks already leverage memory

– e.g. Spark, Shark, and other projects

• File sharing among jobs replicated to disk

– Replication enables fault-tolerance

• Problems

– Disk scan is slow for read.

– Synchronous disk replication for write is even slower.

(8)

Tachyon Project

Outline| Motivation | Design | Results| Status| Future

• Reliable file sharing at memory-speed across

cluster frameworks/jobs

• Challenge

– How to achieve reliable file sharing without

replication?

(9)

Idea

Outline| Motivation | Design | Results| Status| Future

Re-computation (Lineage) based storage

using memory aggressively.

1. One copy of data in memory (Fast)

2. Upon failure, re-compute data using

lineage (Fault tolerant)

(10)

Stack

Outline| Motivation | Design | Results| Status| Future

(11)

System Architecture

Outline| Motivation | Design | Results| Status| Future

(12)

Lineage

Outline| Motivation | Design | Results| Status| Future

(13)

Lineage Information

Outline| Motivation | Design | Results| Status| Future

• Binary program

• Configuration

• Input Files List

• Output Files List

• Dependency Type

(14)

Fault Recovery Time

Outline| Motivation | Design | Results| Status| Future

Re-computation Cost?

(15)

Example

Outline| Motivation | Design | Results| Status| Future

(16)

Asynchronous Checkpoint

Outline| Motivation | Design | Results| Status| Future

1. Better than using existing solutions even

under failure.

2. Bounded recovery time (Naïve and Snapshot

asynchronous checkpointing).

(17)

Master Fault Tolerance

Outline| Motivation | Design | Results| Status| Future

• Multiple masters

– Use ZooKeeper to elect a leader

• After crash workers contact new leader

– Update the state of leader with contents of caches

(18)

Implementation Details

Outline| Motivation | Design | Results| Status| Future

• 15,000+ lines of JAVA

• Thrift for data transport

• Underlayer file system supports HDFS, S3,

localFS, GlusterFS

• Maven, Jenkins

(19)

Sequential Read using Spark

Outline| Motivation | Design | Results | Status| Future

Flat Datacenter

Storage

Theoretical Maximum

Disk Throughput

(20)

Sequential Write using Spark

Outline| Motivation | Design | Results | Status| Future

Flat Datacenter

Storage

Theoretical Maximum

Disk Throughput

(21)

Realistic Workflow using Spark

Outline| Motivation | Design | Results | Status| Future

(22)

Realistic Workflow Under Failure

Outline| Motivation | Design | Results | Status| Future

(23)

Conviva Spark Query (I/O intensive)

Outline| Motivation | Design | Results | Status| Future

More than

75x speedup

Tachyon

outperforms

Spark cache

because of

JAVA GC

(24)

Conviva Spark Query (less I/O intensive)

Outline| Motivation | Design | Results | Status| Future

12x speedup

GC kicks

in earlier

for Spark

cache

(25)

Alpha Status

Outline| Motivation | Design | Results | Status | Future

• Releases

– Developer Preview:

V0.2.1 (4/25/2013)

– Contributions from:

(26)

Alpha Status

Outline| Motivation | Design | Results | Status | Future

• First read of files cached in-memory

• Writes go synchronously to HDFS (No lineage

information in Developer Preview release)

• MapReduce and Spark can run without any code

change (ser/de becomes the new bottleneck)

(27)

Current Features

Outline| Motivation | Design | Results | Status | Future

• Java-like file API

• Compatible with Hadoop

• Master fault tolerance

• Native support for raw tables

• WhiteList, PinList

• Command line interaction

• Web user interface

(28)

Spark without Tachyon

Outline| Motivation | Design | Results | Status | Future

val file = sc.textFile(“hdfs://ip:port/path”)

(29)

Spark with Tachyon

Outline| Motivation | Design | Results | Status | Future

val file = sc.textFile(“tachyon:// ip:port/path”)

(30)

Shark without Tachyon

Outline| Motivation | Design | Results | Status | Future

CREATE TABLE orders_cached AS SELECT * FROM orders;

(31)

Shark with Tachyon

Outline| Motivation | Design | Results | Status | Future

CREATE TABLE orders_tachyon AS SELECT * FROM orders;

(32)

Experiments on Shark

Outline| Motivation | Design | Results | Status | Future

• Shark (from 0.7) can store tables in Tachyon

with fast columnar Ser/De

20 GB data / 5 machines Spark Cache Tachyon

Table Full Scan 1.4 sec 1.5 sec

GroupBys (10 GB Shark Memory) 50 – 90 sec 45 – 50 sec GroupBys (15 GB Shark Memory) 44 – 48 sec 37 – 45 sec

(33)

Experiments on Shark

Outline| Motivation | Design | Results | Status | Future

• Shark (from 0.7) can store tables in Tachyon

with fast columnar Ser/De

20 GB data / 5 machines Spark Cache Tachyon

Table Full Scan 1.4 sec 1.5 sec

GroupBys (10 GB Shark Memory) 50 – 90 sec 45 – 50 sec GroupBys (15 GB Shark Memory) 44 – 48 sec 37 – 45 sec

4 * 100 GB TPC-H data / 17 machines Spark Cache Tachyon

TPC-H Q1 65.68 sec 24.75 sec

TPC-H Q2 438.49 sec 139.25 sec

TPC-H Q3 467.79 sec 55.99 sec

TPC-H Q4 457.50 sec 111.65 sec

(34)

Future

Outline| Motivation | Design | Results | Status | Future

• Efficient Ser/De support

• Fair sharing for memory

• Full support for lineage

• Next release is coming soon

(35)

Acknowledgment

Outline| Motivation | Design | Results | Status | Future

Research Team: Haoyuan Li, Ali Ghodsi, Matei

Zaharia, Eric Baldeschwieler , Scott Shenker, Ion

Stoica

Code Contributors: Haoyuan Li, Calvin Jia, Bill

Zhao, Mark Hamstra, Rong Gu, Hobin Yoon,

Vamsi Chitters, Reynold Xin, Srinivas Parayya,

Dilip Joseph

(36)

Questions?

http://tachyon-project.org

https://github.com/amplab/tachyon

References

Related documents

“as a result oF extended omnichannel capabilities on the back end, a uniFied supply chain enables retailers to FulFill orders across the enterprise and to deliver more oF

Upon completion of the engagement and receipt by ParenteBeard LLC (ParenteBeard) of this signed release letter by the Authorized Third Party User , the report will be provided

Other questions of the study were “Text messaging during work effect your speed of work and concentration” (Mean=4.52), Productive use of cell phone by you is in the benefit

For cancellation 21 days prior to arrival the hotel will charge 30% of the total amount. For cancellation from 20 to 9 days prior to arrival the ho- tel will charge 50% of the

The main alternative approaches to assessing learning in galleries and museums are those developed by the Research Centre for Museums and Galleries, which focuses on generic

and Dufour effects on the boundary layer flow due to natural convection heat and mass transfer over a downward-pointing vertical cone and truncated cone in a porous medium satu-

One motivation for conducting this experiment is the emergence of low cost, capable SDR components that when connected to a suitable antenna and Low Noise Amplifier (LNA)

16 EMC ViPR Analytics Pack for VMware vCenter Operations Management Suite 1.1.0 Installation and Configuration Guide... After