Alex Dimakis USC
2
Overview
•
Motivation
• Data centers
• Mobile distributed storage for D2D
•
Specific storage problems
• Fundamental tradeoff between repair communication
and storage.
• Systematic Repair (open problem)
3
Motivation: Data centers
• Warehouse-sized computing and storage facilities. Cost in the hundreds of
millions.
• Large-scale distributed storage: thousands of servers. Petabytes of disc
space.
• Internet Data centers are the next computing platform: Web search,
4
Massive distributed data storage
• Numerous disk failures
per day.
• Must introduce redundancy
in stored information.
• Replication or erasure coding?
• Coding can give orders of magnitude more reliability
• But problems in creating and maintaining an encoded
•
Infrastructure slow to deploy and upgrade
•
Delivery with opportunistic contacts
[7DS,Haggle, …]
• Extends coverage and capacity using free D2D bandwidth
• Scales as network gets dense [Grossglauser/Tse02]
5
Distributed caching in mobiles
6
Distributed caching in mobiles
5/5/10
• The video you want to watch
is very likely to be
downloaded by people nearby in the next day
• Storage in phones is
increasing more than anything else
• Cache the popular content
7 7
MDS erasure codes
A B A B A+B B A+2B A A+B A B (3,2) MDS code, (single parity) used in RAID 5 (4,2) MDS code. Tolerates any 2 failures Used in RAID 6 k=2 n=3 n=4 File or data object8 8
erasure codes are reliable
A B A A B B A+B A+2B (4,2) MDS erasure code (any 2 suffice to recover)
A
B
Replication
Pr[failure]=0.43 MDS Erasure code Pr[failure]=0.31
vs
Erasure coding is introducing redundancy in an optimal way. Very useful in practice
i.e. Reed-Solomon codes, Fountain Codes, (LT and Raptor)…
Replication
Current storage architectures still use replication. (Gmail makes 21 copies(!))
Can we improve storage efficiency?
File or data object
9 9
New open problems
Issues: • Communication • Update complexity • Repair communication A B
?
Network traffic10
Code Repair: Problem statement
a b c d e 1mb • Assume we have a (4,2) MDS
code and one node leaves the system
• How much data does a
newcomer (e) have to
download, to construct a new encoded packet?
• repairing the code in
distributed environments.
? ? ?
11
Code Repair: first thoughts
a b a+b a+2b e 1mb • Downloading 2mb definitely works.
• But newcomer (e) is
downloading 2mb, to store only 1mb! • Q: Is it possible to download less data? • It is possible to download 1.5mb! 1mb 1mb 1mb
“When coding is used, creating new fragments is not a trivial task. The problem is that to create a new fragment we must have access to the entire data object”
12
Reducing repair bandwidth
a1 1mb 1mb a2 b1 b2 a1+b1 a2+b2 a1+2b1 a2+2b2 b1+b2 a1+b1+2a2+2b2 a1+2b1+3a2+6b2 1 1 1 2 1 3 e1 e1
13
Repair Bandwidth for MDS
•
Theorem 1: For (n,k)-MDS codes, if each node is
storing
α
bits and downloads
β
from each existing
node
•
Proof by reduction to an flow on an (infinite) graph.
€
α
MDS=
M
k
,
β
MDS=
M
k
1
n
−
k
14
Proof sketch: Information flow graph
a e 1mb a b b c c d d α =1mb data collector
∞
∞
β
β
β
1+2 β ≥2 β ≥1/2mb Total download ≥1.5mb S data collector15
Proof sketch: reduction to multicasting
a e a b b c d d data collector β β β S data collector data collector data collector
Repairing a code = multicasting on the information flow graph. sufficient iff minimum of the min cuts is larger than file size M.
(Ahlswede et al. Koetter & Medard, Ho et al.)
data collector
data collector
16
Overview
• Motivation - Distributed storage in data centers
• The code repair problem
• Minimizing repair bandwidth
• Fundamental tradeoff between repair bandwidth and
storage.
17 e β β d
Regenerating codes
a M/k α α b f c d gRepair bandwidth can be greatly reduced if we allow
18
Minimizing repair bandwidth
α α α α α β d α β d α β d α β d € minβd st : MinCut(DCi) ≥ M,∀i d ∈{k,k +1,...n −1}, βd ≥α
19
Ingredient 1: bounding the flow
19
lemma: for any (potentially infinite) graph
G(α,β,d), any data collector has flow at least
€
MinCut(DCi) ≥ Min{(d − i)β,α} i=0
k−1
∑
Proof: sort topologically, count. Bound is tight since satisfied with equality for this graph
20
Ingredient 2: just relax
α α α α α β d α β d α β d α β d € minβd st : min{(d −i)β,α} i=0 k−1
∑
≥ M d ∈{k,k +1,...n −1}, βd ≥αRelax the integer constraint
Show that integer and relaxed problem attain optimum at the same point
21
Minimum repair bandwidth
21
Theorem 2: The minimum repair bandwidth
22
Numerical example
• File size M=20mb , k=20, n=25
• Reed-Solomon : Store α=1mb , repair βd=20mb
• MinStorage-RC : Store α=1mb , repair βd=4.8mb
• MinBandwidth RC : Store α=1.65mb , repair βd=1.65mb
23
Theorem 3: for any (n,k) code, where each node stores α
bits, repairs from d existing nodes and downloads dβ=γ
bits, the feasible region is piecewise linear function described as follows: € αmin = M /k, γ ∈ [ f (0),∞), M − g(i)γ k − i , γ ∈ [ f (i), f (i −1)). € f (i) := 2Md (2k −i −1)i + 2k(d − k +1) g(i) := (2d − 2k + i +1)i 2d
Storage-Communication tradeoff
24
Storage-Communication tradeoff
Min-Storage Regenerating code Min-Bandwidth Regenerating code α βd25
Open Problem: Systematic repair
a b c d e=a 1mb • From Theorem 1, a (4,2) MDS
code can be repaired by downloading
• What if we require perfect
reconstruction? ? ? ? 1mb € αMDS = M k ,βMDS = M k 1 n − k
x1?
26
Repair vs Systematic Repair
x1 α α α α α β d α β d α β d α β d data collector k data collector x2 … xn • Repair= Multicasting
• Systematic repair= Multicasting with intermediate
nodes having (overlapping) requests.
• Cut arguments might not be tight
27 27
Systematic Repair-(4,2) example
x1 x3 x2 x4 x1+x3 x2+x4 x1+2x3 2x2+3x4 x1? x2? x1+x2+x3+x4 2-1x1+2 3-1x2+x3+x4 2-1 3-1 x3+x4
28
• For (n,2) systematic repair can match cutset bound. [WD
ISIT’09]
• (5,3) MSR systematic code exists (Cullina,D,Ho,
Allerton’09)
• For k/n <=1/2 Systematic repair can match cutset bound
[Rashmi, Shah, Kumar, Ramchandran (2010)] [Suh, Ramchandran (2010) ]
• What can be done for high rates?
What is known about
What is known about
systematic
repair
Given an error-correcting code find the repair coefficients that reduce
communication (over a field)
Given some channel matrices find the beamforming matrices that maximize
the DoF
(Cadambe and Jafar, Suh and Tse) (Papailiopoulos &D, working paper)
?
?
• Network codes designed for distributed
storage (Regenerating codes) greatly reduce the communication required to maintain the desired redundancy.
• Nodes cache different content in a
distributed way
• Which content to cache
• How much to store?
• How to find peers that have the desired
content
• Incentives for people to donate storage/
bandwidth
How much to store
• Two files, each of size 1.
• Fix a total redundancy 2
How much to store
• Coding helps
• But finding the best
Problem Description
s.t.
!
x
i≤
T
max
Prob[
n!
i=1x
i1
i≥
1]
Can be generalized to other models of node availability.
• Symmetric allocations can be
suboptimal
– †Given n = 5 storage nodes,
budget T = 12/5, and p = 0.9,
the nonsymmetric allocation
performs better than the optimal symmetric allocation
• Finding the optimal symmetric
allocation is also nontrivial
†Originally from a discussion among R. Karp, R. Kleinberg,
†C. Papadimitriou, E. Friedman also see
S. Jain, M. Demmer, R. Patra, and K. Fall, SIGCOMM’05
Leong, D. Ho, Netcod 2009, Globecom submitted
Distributed storage allocations
Results can be obtained for different access models. For iid model.
Maximal spreading x= T/n was shown to have
asymptotically zero gap from optimality if Tp>1
39
Open Problems
• Cut-Set bounds tight? Linear codes sufficient?
• What is the limit of interference alignment techniques?
• Repairing codes in small fields?
• Existing codes used in storage (e.g. EvenOdd Code,
B-Code, etc?).
• Dealing with bit-errors (security)?
• (Dikaliotis,Ho,D, ISIT’10)
• What is the role of (non-trivial) network topologies?
• Allocations for multiple objects?
40
41 41
42 42
Conclusions
• We proposed a theoretical framework for analyzing encoded information
representations
• Repair reduces to network coding and flow arguments completely
characterize what is possible.
• We identified and characterized a tradeoff between repair bandwidth and
communication for any storage system.
• Numerous interesting questions in coding for data centers- repair/
updates/disk IO vs network bandwidth.
• Systematic, deterministic, small finite field constructions are very