• No results found

Network Coding for Distributed Storage

N/A
N/A
Protected

Academic year: 2021

Share "Network Coding for Distributed Storage"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)

Alex Dimakis USC

(2)

2

Overview

• 

Motivation

• Data centers

• Mobile distributed storage for D2D

• 

Specific storage problems

• Fundamental tradeoff between repair communication

and storage.

• Systematic Repair (open problem)

(3)

3

Motivation: Data centers

•  Warehouse-sized computing and storage facilities. Cost in the hundreds of

millions.

•  Large-scale distributed storage: thousands of servers. Petabytes of disc

space.

•  Internet Data centers are the next computing platform: Web search,

(4)

4

Massive distributed data storage

•  Numerous disk failures

per day.

•  Must introduce redundancy

in stored information.

•  Replication or erasure coding?

•  Coding can give orders of magnitude more reliability

•  But problems in creating and maintaining an encoded

(5)

Infrastructure slow to deploy and upgrade

Delivery with opportunistic contacts

[7DS,Haggle, …]

•  Extends coverage and capacity using free D2D bandwidth

•  Scales as network gets dense [Grossglauser/Tse02]

5

Distributed caching in mobiles

(6)

6

Distributed caching in mobiles

5/5/10

• The video you want to watch

is very likely to be

downloaded by people nearby in the next day

• Storage in phones is

increasing more than anything else

• Cache the popular content

(7)

7 7

MDS erasure codes

A B A B A+B B A+2B A A+B A B (3,2) MDS code, (single parity) used in RAID 5 (4,2) MDS code. Tolerates any 2 failures Used in RAID 6 k=2 n=3 n=4 File or data object
(8)

8 8

erasure codes are reliable

A B A A B B A+B A+2B (4,2) MDS erasure code (any 2 suffice to recover)

A

B

Replication

Pr[failure]=0.43 MDS Erasure code Pr[failure]=0.31

vs

Erasure coding is introducing redundancy in an optimal way. Very useful in practice

i.e. Reed-Solomon codes, Fountain Codes, (LT and Raptor)…

Replication

Current storage architectures still use replication. (Gmail makes 21 copies(!))

Can we improve storage efficiency?

File or data object

(9)

9 9

New open problems

Issues: •  Communication •  Update complexity •  Repair communication A B

?

Network traffic
(10)

10

Code Repair: Problem statement

a b c d e 1mb • Assume we have a (4,2) MDS

code and one node leaves the system

• How much data does a

newcomer (e) have to

download, to construct a new encoded packet?

• repairing the code in

distributed environments.

? ? ?

(11)

11

Code Repair: first thoughts

a b a+b a+2b e 1mb • Downloading 2mb definitely works.

• But newcomer (e) is

downloading 2mb, to store only 1mb! • Q: Is it possible to download less data? •  It is possible to download 1.5mb! 1mb 1mb 1mb

“When coding is used, creating new fragments is not a trivial task. The problem is that to create a new fragment we must have access to the entire data object

(12)

12

Reducing repair bandwidth

a1 1mb 1mb a2 b1 b2 a1+b1 a2+b2 a1+2b1 a2+2b2 b1+b2 a1+b1+2a2+2b2 a1+2b1+3a2+6b2 1 1 1 2 1 3 e1 e1

(13)

13

Repair Bandwidth for MDS

Theorem 1: For (n,k)-MDS codes, if each node is

storing

α

bits and downloads

β

from each existing

node

Proof by reduction to an flow on an (infinite) graph.

α

MDS

=

M

k

,

β

MDS

=

M

k

1

n

k

(14)

14

Proof sketch: Information flow graph

a e 1mb a b b c c d d α =1mb data collector

β

β

β

1+2 β ≥2  β ≥1/2mb Total download ≥1.5mb S data collector
(15)

15

Proof sketch: reduction to multicasting

a e a b b c d d data collector β β β S data collector data collector data collector

Repairing a code = multicasting on the information flow graph. sufficient iff minimum of the min cuts is larger than file size M.

(Ahlswede et al. Koetter & Medard, Ho et al.)

data collector

data collector

(16)

16

Overview

• Motivation - Distributed storage in data centers

• The code repair problem

• Minimizing repair bandwidth

• Fundamental tradeoff between repair bandwidth and

storage.

(17)

17 e β β d

Regenerating codes

a M/k α α b f c d g

Repair bandwidth can be greatly reduced if we allow

(18)

18

Minimizing repair bandwidth

α α α α α β d α β d α β d α β d € minβd st : MinCut(DCi) ≥ M,∀i d ∈{k,k +1,...n −1}, βd ≥α

(19)

19

Ingredient 1: bounding the flow

19

lemma: for any (potentially infinite) graph

G(α,β,d), any data collector has flow at least

MinCut(DCi) ≥ Min{(di)β,α} i=0

k−1

Proof: sort topologically, count. Bound is tight since satisfied with equality for this graph

(20)

20

Ingredient 2: just relax

α α α α α β d α β d α β d α β d € minβd st : min{(di)β,α} i=0 k−1

M d ∈{k,k +1,...n −1}, βd ≥α

Relax the integer constraint

Show that integer and relaxed problem attain optimum at the same point

(21)

21

Minimum repair bandwidth

21

Theorem 2: The minimum repair bandwidth

(22)

22

Numerical example

• File size M=20mb , k=20, n=25

• Reed-Solomon : Store α=1mb , repair βd=20mb

• MinStorage-RC : Store α=1mb , repair βd=4.8mb

• MinBandwidth RC : Store α=1.65mb , repair βd=1.65mb

(23)

23

Theorem 3: for any (n,k) code, where each node stores α

bits, repairs from d existing nodes and downloads dβ=γ

bits, the feasible region is piecewise linear function described as follows: € αmin = M /k, γ ∈ [ f (0),∞), Mg(iki , γ ∈ [ f (i), f (i −1)).      € f (i) := 2Md (2ki −1)i + 2k(dk +1) g(i) := (2d − 2k + i +1)i 2d

Storage-Communication tradeoff

(24)

24

Storage-Communication tradeoff

Min-Storage Regenerating code Min-Bandwidth Regenerating code α βd
(25)

25

Open Problem: Systematic repair

a b c d e=a 1mb • From Theorem 1, a (4,2) MDS

code can be repaired by downloading

• What if we require perfect

reconstruction? ? ? ? 1mb € αMDS = M kMDS = M k 1 nk

(26)

x1?

26

Repair vs Systematic Repair

x1 α α α α α β d α β d α β d α β d data collector k data collector x2 … xn • Repair= Multicasting

• Systematic repair= Multicasting with intermediate

nodes having (overlapping) requests.

• Cut arguments might not be tight

(27)

27 27

Systematic Repair-(4,2) example

x1 x3 x2 x4 x1+x3 x2+x4 x1+2x3 2x2+3x4 x1? x2? x1+x2+x3+x4 2-1x1+2 3-1x2+x3+x4 2-1 3-1 x3+x4

(28)

28

• For (n,2) systematic repair can match cutset bound. [WD

ISIT’09]

• (5,3) MSR systematic code exists (Cullina,D,Ho,

Allerton’09)

• For k/n <=1/2 Systematic repair can match cutset bound

[Rashmi, Shah, Kumar, Ramchandran (2010)] [Suh, Ramchandran (2010) ]

•  What can be done for high rates?

What is known about

(29)

What is known about

systematic

repair

Given an error-correcting code find the repair coefficients that reduce

communication (over a field)

Given some channel matrices find the beamforming matrices that maximize

the DoF

(Cadambe and Jafar, Suh and Tse) (Papailiopoulos &D, working paper)

(30)

?

?

•  Network codes designed for distributed

storage (Regenerating codes) greatly reduce the communication required to maintain the desired redundancy.

•  Nodes cache different content in a

distributed way

•  Which content to cache

•  How much to store?

•  How to find peers that have the desired

content

•  Incentives for people to donate storage/

bandwidth

(31)

How much to store

• Two files, each of size 1.

• Fix a total redundancy 2

(32)

How much to store

• Coding helps

• But finding the best

(33)
(34)
(35)
(36)

Problem Description

s.t.

!

x

i

T

max

Prob[

n

!

i=1

x

i

1

i

1]

Can be generalized to other models of node availability.

(37)

•  Symmetric allocations can be

suboptimal

–  †Given n = 5 storage nodes,

budget T = 12/5, and p = 0.9,

the nonsymmetric allocation

performs better than the optimal symmetric allocation

•  Finding the optimal symmetric

allocation is also nontrivial

Originally from a discussion among R. Karp, R. Kleinberg,

C. Papadimitriou, E. Friedman also see

S. Jain, M. Demmer, R. Patra, and K. Fall, SIGCOMM’05

(38)

Leong, D. Ho, Netcod 2009, Globecom submitted

Distributed storage allocations

Results can be obtained for different access models. For iid model.

Maximal spreading x= T/n was shown to have

asymptotically zero gap from optimality if Tp>1

(39)

39

Open Problems

•  Cut-Set bounds tight? Linear codes sufficient?

•  What is the limit of interference alignment techniques?

•  Repairing codes in small fields?

•  Existing codes used in storage (e.g. EvenOdd Code,

B-Code, etc?).

•  Dealing with bit-errors (security)?

•  (Dikaliotis,Ho,D, ISIT’10)

•  What is the role of (non-trivial) network topologies?

•  Allocations for multiple objects?

(40)

40

(41)

41 41

(42)

42 42

Conclusions

• We proposed a theoretical framework for analyzing encoded information

representations

• Repair reduces to network coding and flow arguments completely

characterize what is possible.

• We identified and characterized a tradeoff between repair bandwidth and

communication for any storage system.

• Numerous interesting questions in coding for data centers- repair/

updates/disk IO vs network bandwidth.

• Systematic, deterministic, small finite field constructions are very

References

Related documents

The summary resource report prepared by North Atlantic is based on a 43-101 Compliant Resource Report prepared by M. Holter, Consulting Professional Engineer,

• Upload valid load chart file • Replace central unit E56 Error in crane data file. • No valid data in the crane data file

Favor you leave and sample policy employees use their job application for absence may take family and produce emails waste company it discusses email etiquette Deviation from

The algorithm trains a classifier for combinations of parameter values and applies user-defined objective functions, such as the classification error, the sensitivity, or

Composing a TOSCA Service Template for a “SugarCRM” Application using Vnomic’s Service Designer, www.vnomic.com. The SugarCRM application include

Modules across their own purpose sales is to make educated forecasts and utility of sale of an invoice number that in sale of payment for leaving the template?. Little reason to

This multi-layer approach has been justified by showing that with enough hidden units and correct initialization, increasing the number of layers improves the lower bound of the

temporary food service event in which it participates (unless exempted). Annual temporary event licenses are also available which allow participation in an unlimited number