• No results found

Decentralized Deduplication in SAN Cluster File Systems

N/A
N/A
Protected

Academic year: 2021

Share "Decentralized Deduplication in SAN Cluster File Systems"

Copied!
117
0
0

Loading.... (view fulltext now)

Full text

(1)

Decentralized Deduplication

in SAN Cluster File Systems

Austin T. Clements

Irfan Ahmad Murali Vilayannur Jinyuan Li

VMware, Inc. MIT CSAIL

(2)

Storage Area Networks

1.3 TB

(3)

Storage Area Networks

1.3 TB

(4)

Storage Area Networks

1.3 TB

(5)

Storage Area Networks

1.3 TB

(6)

Storage Area Networks

1.3 TB

(7)

Storage Area Networks

1.3 TB 237 GB

(8)

Storage Area Networks

1.3 TB 237 GB

5X reduction

(9)

Deduplication

(10)

Deduplication

(11)

Deduplication

(12)

Deduplication

(13)

Deduplication

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

(14)

Deduplication

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

(15)

Deduplication

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

(16)

Deduplication

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

(17)

The Problem

Slow— Lots of IO in the write path. Can’t cache the index.

Very Slow— Writes require allocation and thus coordination. No hope of disk locality.

Hopelessly Slow— Multi-host lock contention on shared index.

(18)

The Problem

Slow— Lots of IO in the write path. Can’t cache the index.

Very Slow— Writes require allocation and thus coordination. No hope of disk locality.

Hopelessly Slow— Multi-host lock contention on shared index.

(19)

The Problem

Slow— Lots of IO in the write path. Can’t cache the index.

Very Slow— Writes require allocation and thus coordination. No hope of disk locality.

Hopelessly Slow— Multi-host lock contention on shared index.

(20)

Three-Stage Deduplication

Decentralized Deduplication

Out-of-band deduplication of live, primary storage Process duplicates efficiently, in large batches Minimize contention on the index

Resilient to stale index information

Unique blocks remain mutable and sequential

= No overhead for blocks that don’t benefit from deduplication

(21)

Three-Stage Deduplication

DeDe

Out-of-band deduplication of live, primary storage Process duplicates efficiently, in large batches Minimize contention on the index

Resilient to stale index information

Unique blocks remain mutable and sequential

= No overhead for blocks that don’t benefit from deduplication

(22)

Three-Stage Deduplication

DeDe

Out-of-band deduplication of live, primary storage Process duplicates efficiently, in large batches Minimize contention on the index

Resilient to stale index information

Unique blocks remain mutable and sequential

= No overhead for blocks that don’t benefit from deduplication

(23)

Three-Stage Deduplication

DeDe

Out-of-band deduplication of live, primary storage Process duplicates efficiently, in large batches Minimize contention on the index

Resilient to stale index information

Unique blocks remain mutable and sequential

= No overhead for blocks that don’t benefit from deduplication

(24)

Three-Stage Deduplication

DeDe

(25)

Three-Stage Deduplication

DeDe

(26)

Three-Stage Deduplication

DeDe

(27)

Three-Stage Deduplication

DeDe

(28)

Three-Stage Deduplication

DeDe f4bea9.. f2a4d2.. 0e7a26.. 15ba2b.. d5e341.. bc6887.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

(29)

Three-Stage Deduplication

DeDe f4bea9.. f2a4d2.. 0e7a26.. 15ba2b.. d5e341.. bc6887.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

(30)

Three-Stage Deduplication

DeDe 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. f4bea9.. f2a4d2.. 0e7a26.. 15ba2b.. d5e341.. bc6887.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

(31)

Three-Stage Deduplication

DeDe 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. f4bea9.. f2a4d2.. 0e7a26.. 15ba2b.. d5e341.. bc6887.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

(32)

Three-Stage Deduplication

DeDe 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

(33)

Three-Stage Deduplication

DeDe 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

(34)

Three-Stage Deduplication

DeDe 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. ab7373.. 4ee207.. 9b1f28.. 9b575b.. 94c9c9.. 288bc7..

(35)

Three-Stage Deduplication

DeDe 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

(36)

Three-Stage Deduplication

DeDe 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

(37)

Three-Stage Deduplication

DeDe 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. ? ? ? ?

(38)

Three-Stage Deduplication

DeDe 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. ? ? ? ?

(39)

Three-Stage Deduplication

DeDe 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. ? ? ? ?

(40)

Implementation

(41)

Implementation

(42)

Implementation

(43)

Implementation

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

(44)

Implementation

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

(45)

VMFS

Compare-and-share

(46)

VMFS

Compare-and-share

?

(47)

VMFS

Compare-and-share

(48)

VMFS

Compare-and-share

DeDe finds duplicates. VMFS eliminates them.

(49)

Write Monitor

A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk Safe to buffer the log because index is resilient

150 MB of regular writes 1 MB sequential log write

(50)

Write Monitor

A lightweight kernel module monitors writes, computes hashes

It buffers the write log in userspace before writing it to disk Safe to buffer the log because index is resilient

150 MB of regular writes 1 MB sequential log write

(51)

Write Monitor

A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk

Safe to buffer the log because index is resilient

150 MB of regular writes 1 MB sequential log write

(52)

Write Monitor

A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk

Safe to buffer the log because index is resilient

150 MB of regular writes 1 MB sequential log write

(53)

Write Monitor

A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk Safe to buffer the log because index is resilient

150 MB of regular writes 1 MB sequential log write

(54)

Write Monitor

A lightweight kernel module monitors writes, computes hashes It buffers the write log in userspace before writing it to disk Safe to buffer the log because index is resilient

150 MB of regular writes 1 MB sequential log write

(55)

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9..

Map from hashes to block locators, list sorted by hash

Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks

(56)

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Map from hashes to block locators, list sorted by hash

Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks

(57)

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable

A virtual arena stores COW references to all shared blocks

(58)

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable

A virtual arena stores COW references to all shared blocks

(59)

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable

A virtual arena stores COW references to all shared blocks

(60)

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks

(61)

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks

(62)

The Index

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable A virtual arena stores COW references to all shared blocks

(63)

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

(64)

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

(65)

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

(66)

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 12067c.. 6cd412.. c277d6..

(67)

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 12067c.. 6cd412.. c277d6..

(68)

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 12067c.. 6cd412.. c277d6..

(69)

Indexing and Duplicate Elimination

0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 12067c.. 6cd412.. c277d6..

(70)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 6cd412.. c277d6..

(71)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 6cd412.. c277d6..

(72)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared 6cd412.. c277d6..

(73)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared c277d6..

(74)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared c277d6..

(75)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared c277d6..

(76)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

?

c277d6..

(77)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared c277d6..

(78)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared c277d6..

(79)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

(80)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

(81)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

(82)

Indexing and Duplicate Elimination

12067c.. 0e7a26.. 15ba2b.. 6cd412.. ab7373.. bc6887.. c277d6.. d5e341.. f2a4d2.. f4bea9.. Unique Shared

(83)

Phew.

(84)

Evaluation

How much space does DeDe save? How much overhead does DeDe introduce?

How fast can DeDe deduplicate?

(85)

Evaluation

How much space does DeDe save?

How much overhead does DeDe introduce? How fast can DeDe deduplicate?

(86)

Evaluation

How much space does DeDe save? How much overhead does DeDe introduce?

How fast can DeDe deduplicate?

(87)

Evaluation

How much space does DeDe save? How much overhead does DeDe introduce?

How fast can DeDe deduplicate?

(88)

Space Savings: VDI Cluster

Corporate Virtual Desktop Infrastructure cluster Desktop XP VM’s

6–12 months of active use Originally cloned from small number of base images

...                            113 VM’s    1.9 TB 1.3 TB

(89)

Space Savings: VDI Cluster

Corporate Virtual Desktop Infrastructure cluster Desktop XP VM’s

6–12 months of active use Originally cloned from small number of base images

...                            113 VM’s    1.9 TB 1.3 TB

(90)

Space Savings: VDI Cluster

Corporate Virtual Desktop Infrastructure cluster Desktop XP VM’s

6–12 months of active use Originally cloned from small number of base images

...                            113 VM’s    1.9 TB 1.3 TB

(91)

Space Savings: VDI Cluster

1.3 TB

(92)

Space Savings: VDI Cluster

1.3 TB 237 GB

(93)

Space Savings: VDI Cluster

1.3 TB 237 GB 173 GB unique 61 GB shared

(94)

Space Savings: VDI Cluster

1.3 TB 237 GB 173 GB unique 61 GB shared 2.7 GB

(95)

Space Savings: VDI Cluster

1.3 TB 237 GB 173 GB unique 61 GB shared 2.7 GB 1.3 GB index file 194 MB v. arena 1.1 GB FS metadata

(96)

Runtime Effects

Write monitoring Disk array caching

EMC CLARiiON CX3-40

(97)

Runtime Effects

Write monitoring Disk array caching

EMC CLARiiON CX3-40

(98)

Runtime Effects

Write monitoring Disk array caching

EMC CLARiiON CX3-40

(99)

Runtime Overhead: Write Monitoring

Worst-case benchmark: 100% sequential write IO, No computation

Baseline Write Monitor

CPU 33% 220% Bandwidth (MB/s) 233 233 Latency (ms) 8.6 8.6 (See paper for database application benchmark)

(100)

Runtime Overhead: Write Monitoring

Worst-case benchmark: 100% sequential write IO, No computation

Baseline Write Monitor

CPU 33% 220% Bandwidth (MB/s) 233 233 Latency (ms) 8.6 8.6 (See paper for database application benchmark)

(101)

Runtime Overhead: Write Monitoring

Worst-case benchmark: 100% sequential write IO, No computation

Baseline Write Monitor

CPU 33% 220% Bandwidth (MB/s) 233 233 Latency (ms) 8.6 8.6

(See paper for database application benchmark)

(102)

Runtime Overhead: Write Monitoring

Worst-case benchmark: 100% sequential write IO, No computation

Baseline Write Monitor

CPU 33% 220% Bandwidth (MB/s) 233 233 Latency (ms) 8.6 8.6 (See paper for database application benchmark)

(103)

Runtime Gains: Disk Array Caching

Reduced storage footprint Better caching Less IO

(104)

Runtime Gains: Disk Array Caching

Reduced storage footprint Better caching Less IO

0 10 20 30 40 50 60 70 80 90 2 4 6 8 10 12 14 16 18 20

Average boot time (secs)

# VMs booting concurrently VDI "boot storm"

(105)

Runtime Gains: Disk Array Caching

Reduced storage footprint Better caching Less IO

0 10 20 30 40 50 60 70 80 90 2 4 6 8 10 12 14 16 18 20

Average boot time (secs)

# VMs booting concurrently VDI "boot storm" Full copies

(106)

Runtime Gains: Disk Array Caching

Reduced storage footprint Better caching Less IO

0 10 20 30 40 50 60 70 80 90 2 4 6 8 10 12 14 16 18 20

Average boot time (secs)

# VMs booting concurrently VDI "boot storm" Full copies

Deduplicated

(107)

Out-of-band Deduplication Rate

Index scan 6.6 GB/sec COW sharing 2.6 MB/sec (It’s a prototype!)

Virtually no cost for unique blocks

9 GB ofnew shared blocks per hour

(And provisioning can be special-cased)

(108)

Out-of-band Deduplication Rate

Index scan 6.6 GB/sec

COW sharing

2.6 MB/sec (It’s a prototype!)

Virtually no cost for unique blocks

9 GB ofnew shared blocks per hour

(And provisioning can be special-cased)

(109)

Out-of-band Deduplication Rate

Index scan 6.6 GB/sec

COW sharing 2.6 MB/sec

(It’s a prototype!)

Virtually no cost for unique blocks

9 GB ofnew shared blocks per hour

(And provisioning can be special-cased)

(110)

Out-of-band Deduplication Rate

Index scan 6.6 GB/sec

COW sharing 2.6 MB/sec

(It’s a prototype!)

Virtually no cost for unique blocks

9 GB ofnew shared blocks per hour

(And provisioning can be special-cased)

(111)

Out-of-band Deduplication Rate

Index scan 6.6 GB/sec

COW sharing 2.6 MB/sec

(It’s a prototype!)

Virtually no cost for unique blocks

9 GB ofnew shared blocks per hour

(And provisioning can be special-cased)

(112)

Related Work

Centralized archival Venti

Data Domain Foundation

Centralized primary storage NetApp ASIS

Microsoft Single Instance Store Distributed

Farsite

SAN with Coordinator DDE

(113)

Conclusion

Decentralized, out-of-band, live file system deduplication.

Deduplication is effective.

Deduplication is hard.

Three-stage deduplication has only modest performance overhead. Thank you.

(114)

Conclusion

Decentralized, out-of-band, live file system deduplication.

Deduplication is effective.

Deduplication is hard.

Three-stage deduplication has only modest performance overhead. Thank you.

(115)

Conclusion

Decentralized, out-of-band, live file system deduplication.

Deduplication is effective.

Deduplication is hard.

Three-stage deduplication has only modest performance overhead. Thank you.

(116)

Conclusion

Decentralized, out-of-band, live file system deduplication.

Deduplication is effective.

Deduplication is hard.

Three-stage deduplication has only modest performance overhead.

Thank you.

(117)

Conclusion

Decentralized, out-of-band, live file system deduplication.

Deduplication is effective.

Deduplication is hard.

Three-stage deduplication has only modest performance overhead. Thank you.

References

Related documents

The scope of testing can likewise be appropriately divided among different machines; for example, a performance test and mechanical running test can be performed on the

In Chapter 3 of this thesis, the story of Tamar will be read by means of an African Feminist approach that will employ the experience of Zulu women’s oppression in the

Twelve(12) Aeromagnetic maps on a scale of 1:100,000, covering the middle Benue Trough were digitized and processed using computer techniques which include; map merging,

Using Social Network Analysis tools we discuss the results according to (i) the structural, technological and geographical dimensions of knowledge flows, (ii) the influence of

The liability of Boston Scientific under this warranty shall be limited to: (a) replacement with a functionally equivalent Lead, Extension, Splitter, or Connector; or (b) full

Sources have revealed that FM radio broadcasting is very popular especially in developed areas including Europe and United States because higher sound fidelity

A new letter of assignment and copy of press card (or other materials in lieu of a press card) will be required each time.. Can I be approved automatically for IPCC conferences

The letters transcribed are two drafts of the same letter sent from William Williams Stringfield to President Andrew Johnson in June of 1866!. They detail String- field’s request