• No results found

Trends in Enterprise Backup Deduplication

N/A
N/A
Protected

Academic year: 2021

Share "Trends in Enterprise Backup Deduplication"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Trends in Enterprise Backup

1 © Copyright 2009 EMC Corporation. All rights reserved.

Shankar Balasubramanian

Architect, EMC

(2)

Outline

Protection Storage

Deduplication Basics

(3)

Data Protection Issues

Tapes are archaic

– Unreliable

– Inflexible (needs to be streamed)

– Requires manual intervention for Disaster Recovery

– Restore performance is abysmal (no random access)

Backup is too slow

3 © Copyright 2009 EMC Corporation. All rights reserved.

– Traditional architecture moves all the data repeatedly

Full restore is too slow

– Primary storage not that good at writes, esp. small files

– Must restore all data before any is available

“Introducing disk based protection storage based on

deduplication

(4)
(5)

Protection Storage vs. Primary Storage

Primary Storage

Protection Storage

Workload

Continuous random accesses Mostly reads

Lots of meta-data accesses

Large batches of accesses Mostly writes

Few meta-data accesses

Cost

Dollars / IOPS Dollars / TB

5 © Copyright 2009 EMC Corporation. All rights reserved.

Cost

Dollars / IOPS Dollars / TB

Performance

Latency and throughput, IOPS Sequential throughput

(6)

Protection Storage Features

Protection Storage

Cost

Global CompressionTM for Data Reduction

Inline Deduplication

Local compression (Lempel-Ziv, GZ, GZ fast style)

Performance

Excellent sequential throughput

Performance

Excellent sequential throughputOnly fair random access and small file performance

(7)

Deduplication Backup Storage for D2D+DR

Backup/ media servers

WAN

Onsite Retention Storage Offsite Disaster Recovery Storage Backup Clients 7 © Copyright 2009 EMC Corporation. All rights reserved.

Confidential 7

Retention/ Restore

servers Storage Storage

Backup Replication DR



Deduplicating storage systems take role of tape libraries

– Plug/play w/ standard backup software



Replication of reduced data for WAN Vaulting

– Recover locally or remotely



Appliance packaging with options:

– Storage system with controller, firmware and disks

(8)
(9)

Deduplication techniques

Regular Storage Array 1:1

Single Instance Storage LZ Compression

~ 2:1 Whitespace

Reduction

9 © Copyright 2009 EMC Corporation. All rights reserved.

Single Instance Storage ~ 3:1 File Level Fixed Block ~ 3:1 Fixed Blocks, Snapshots Backup Target, Variable Segment Variable Segment ~ 20:1 Deduplication Significantly Reduces - Replication WAN Bandwidth - Power

(10)

How Deduplication Works

A

B

C

F

D

E

A

B

C

B

D

A

B

E

file system /VTL backup /archive SW

1

st

full backup

1

st

increment

2

nd

full backup

data stream

A B C D E F

unique variable segments (4KB-12KB)

redundant data segments

(11)

Segment Data (from a backup stream) sliced into segments

Fingerprint Fingerprints for segments are computed

Filter

Fingerprints compared to fingerprints in summary vector and cache

1. If fingerprint is new, continue

2. If fingerprint is duplicate, reference, then drop duplicate segments

Compress Groups of new segments compressed using lz, gz, or gzfast

Write Segments and metadata written to containers, containers written to disk

1

1

2

2

3

3

4

4

5

5

Deduplication work flow

11 © Copyright 2009 EMC Corporation. All rights reserved.

1

1

~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~

2

2

4

4

disk disk

container

3

3

5

5

Data Domain system

Write Segments and metadata written to containers, containers written to disk

(12)

Compression Effect

D

a

ta

S

to

re

d

i

n

T

B

Traditional

Storage

Capacity

Disk Savings

10

20

30

First full backup: 2-4x data reduction

File-level incrementals: 5-10x

Weeks in use

D

a

ta

S

to

re

d

i

n

T

B

Capacity

Optimized

1 TB data set

1 full + 6 incr / week

0

(13)

CPU-centric Deduplication: SISL

(Stream-informed Segment Layout)

(14)
(15)

Deduplication at backup / recovery speeds

Deduplication is done by looking for matching fingerprints

– 1TB physical storage will have 125 million fingerprints

– We cannot store all the fingerprints in memory as that will need too much memory

– Caching the fingerprints will not work because fingerprints are random

– We cannot read the disks directly as that will not need too many disks to be read in

parallel

DD answer: SISL

15 © Copyright 2009 EMC Corporation. All rights reserved.

DD answer: SISL

– Stream-informed Segment Layout; includes:

Summary Vector in RAM says if segment is new

(16)

Set bits in SV in RAM for each segment stored

Summary Vector (Bloom Filter) for new

segments

(17)

Segment Localities

abcd

A

B

C

D

efgh

E

F

G

H

ijkl

I

J

K

L

. . .

stuv

S

T

U

V

Metadata

Segment

data

DDFS

log

structure

17 © Copyright 2009 EMC Corporation. All rights reserved.

Localities

– Stream-informed storage units

– Neighboring unique segments stored together

– Fingerprints and segments stored together with other metadata

One seek can retrieve hundreds into RAM

– Fast caching for fingerprint lookup

(18)
(19)

Data Invulnerability Architecture

Designed from the ground up for data protection

– File system simplicity & resiliency

Five lines of defense against data loss

– End-to-end verification

– Fault avoidance and containment

19 © Copyright 2009 EMC Corporation. All rights reserved.

– Continuous fault detection and healing

(20)

End-to-End Verification at Backup Time

DDOS tests recoverability

asynchronously after backups

– File system consistency

– Data integrity on disk

Primary storage can’t verify after

write

– It would be too slow

– Primary storage discovers problems

during restore

(21)

Fault Avoidance and Containment

Custom log-structured file system architecture

Localities

21 © Copyright 2009 EMC Corporation. All rights reserved.

Custom log-structured file system architecture

New data never overwrites good data

Previous backups are not at risk

Fewer complex data structures mean fewer bugs

No bitmaps and link counts to corrupt

NVRAM for fast, safe restart

DD-RAID does no partial-stripe writes

(22)

Continuous Fault Detection and Healing

DD-RAID 6

– Protection against

 Two disk failures

 Disk read errors during reconstruction

 Operator pulling the wrong disk

– Verifies data integrity and stripe

coherency after writes

On-the-fly Error Detection and

Correction

– All on-disk structures covered by strong

checksums

– Data correctness verified on every disk

read

– Data errors corrected automatically from

(23)

File System Recoverability

Self-describing data format

– Metadata structures rebuildable from Locality log

File system check (FSCK), if needed, is fast

– Checks and repairs done on de-duplicated data

 Run checks on 4 TB of data, not 80 TB

– No overwrite means it’s safe to bring system back on line

23 © Copyright 2009 EMC Corporation. All rights reserved.

(24)

Summary

Data Protection needs to change

– Eliminate Tape, backup should be fast, restore should be easy

Data Domain core technology can eliminate tape

– Backup to disk at the cost of tape

– Keep data safe despite reduced copies

(25)

References

Related documents

Backup solutions enabled by deduplication include EMC Avamar deduplication backup software; EMC Data Domain deduplication storage systems; and EMC NetWorker, which can be

Existing Backup Server Tape Centralized Storage EMC Disk Library Backup copies for faster backup and recovery... EMC has a range of backup applications from basic

storage Backup/ media server Onsite Retention Storage Offsite Disaster Recovery Storage Retention/ Restore Replication DR Backup Archive to tape As required WAN. •

 File level backups and restores using Symantec Backup Exec software  Deduplication of backup data to heavily reduce backup storage requirements  Remote server agents

Using this proxy system, backup software can capture VMDK and guest OS file backups with low application- VM impact and only moderate impact on the ESX server.. This system requires

Significantly reduce your backup storage and regain precious time with a comprehensive solution that combines data deduplication; integrated backup copy to public cloud

Conventional (Tape-centric) Transformational (Disk-centric) Backup/Media Manager Onsite Backup Storage Disaster Recovery Storage Application Backup Clients VTL/Tape Tape Tape VM

NetWorker integrated with Avamar provides an efficient solution for the centralized management of data deduplication and