p
prreesseennttaattiioonn oonneeOOppttiimmiizziinngg YYoouurr BBaacckkuupp SSyysstteemm The backup system is one of the most expensive and troublesome systems in your data center, and yet it’s rarely configured for optimum performance and maximum utilization of resources. This hour kicks off with an expla-nation of some overlooked features of typical commercial backup software products. Covering real-life examples of what to look for in your backup system to gauge whether or not it is optimized (including success rates, partial backups, consecutive failures, throughput rates and media utilization), Curtis drives home his points with customer data pulled from actual customers’ backup systems. This session will also explore the importance of using disk in your backup system.
also covered
• The three rules of encryption: key management, key management, key management
• Role-based administration • Separation of powers
• Background checks and other techniques
N
NEEXXTTGGEENN BBAACCKKUUPP SSEEMMIINNAARR PPRREESSEENNTTAATTIIOONN DDOOWWNNLLOOAADD
Optimizing Your Backup System
p
Backup School 2008
W. Curtis Preston VP Data Protection
Optimizing Your
Backup System
Things you might not know
about your backup system and what to do about them
A Little About Me
z When I started as “backup guy” at $35B company in
1993:
• Tape Drive: QIC 80 (80 MB capacity)
• Tape Drive: Exabyte 8200 (2.5 GB & 256KB/s)
• Biggest Server: 4 GB (’93), 100 GB (’96)
• Entire Data Center: 200 GB (’93), 400 GB (’96)
• My TIVO now has 5 times the storage my data center did!
z Consulting in backup & recovery since ‘96
z Author of O’Reilly’s Backup & Recovery &
Using SANs and NAS
z Webmaster of BackupCentral.com
A Little bit about where I work
z GlassHouse is an independent professional
services firm specializing in IT infrastructure
z This is important so you understand where I’m
coming from
z We don’t make or resell any hardware or
software
z No reason to promote or bash any product
z The information you will hear today is based on
real experiences with hundreds of companies, including the largest companies in the world
Optimizing Your Backup System
z Your true success rate
z Partial backups
z Consecutive failures
z Tape throughput rates
z Media utilization
Real Customer Data
z The following slides use real customer data
(anonymized) gathered using our own backup assessment tool that collects & parses backup data and puts it in a database
z This allows us to collect and compare PIs across
multiple backup servers and customers
z First time this data is being presented publicly
• 112 customers
• Average of 128K jobs per customer
Average Customer Success Rate
z Average success rate of 78% (average of
averages)
Customers by success rate
0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% <10 <20 <30 <40 <50 <60 <70 <80 <90 <95 <96 <97 <98 <99 <100Success Rate by Industry
0 10 20 30 40 50 60 70 80 90 100 Airlines Biotech/Pharm Communications Software/Hardware Entertainment Finance/Banking Food Government Healthcare Insurance Manufacturing Oil/Gas RetailSo what?
z Success rate of 91% sounds good, right?
z A lot of customers assume that a success rate in
the 90s is a good thing. Let’s examine why it’s not.
z That’s 1.2M failed backups!
z Overall success rate doesn’t tell the whole
story. What about:
• Client-level success rates
Client-level success rates
z Must zoom in on an individual customer for
this level of detail
z Stats of this customer
• 35 backup servers • 10,000 clients • 477,217 backup jobs • 92% successful • 5% partial • 3% failed
• Same view by
backup server
• Some with rather
high failure rates
• But, still. Nothing
too terrible, right?
1,597 0% 0% 100% 1,053 0% 0% 100% 7,708 0.01% 0.04% 99.95% 14,323 0.05% 0.02% 99.93% 10,257 0.27% 0.03% 99.70% 21,871 0.43% 0% 99.57% 8,906 0.51% 0.02% 99.47% 89,836 0.70% 0.05% 99.25% 100,501 1.10% 0.23% 98.67% 33,493 1.30% 0.13% 98.56% 35,759 3.95% 0.09% 95.96% 179 0% 5.03% 94.97% 144 0% 5.56% 94.44% 32,219 4.03% 2.66% 93.31% 406 7.39% 0.49% 92.12% 6,295 1.29% 11.25% 87.47% 6,002 0.50% 13.05% 86.45% 5,013 1.32% 14.90% 83.78% 2,937 0.31% 16.96% 82.74% 78 1.28% 16.67% 82.05% 3,489 2.98% 17.23% 79.79% 2,627 2.85% 19.38% 77.77% 458 13.76% 10.04% 76.20% 109 2.75% 22.02% 75.23% 23,378 11.79% 13.84% 74.37% 3,053 1.54% 26.56% 71.90% 3,789 6.55% 24.52% 68.94% 47,679 13.55% 17.97% 68.48% 12,896 9.56% 22.14% 68.30% 80 1.25% 36.25% 62.50% 1,044 13.12% 34.39% 52.49% 38 10.53% 39.47% 50% Number of Jobs Failed Partial Successful
Success Rate
by Backup
Server
Clients by Failure %
z Failure percentage is
the total number of
failures divided by the total number of
backups for a given client
z Shown here is a
summary of the top 1000 (out of 10,000) clients when sorted by percentage of failures
z Getting worse, right?
57 90-100 61 80-90 162 70-80 13 60-70 88 50-60 99 40-50 210 30-40 310 25-30 Count Failed %
Consecutive Failures
z When two days go by without
a single successful backup for a client, that’s a consecutive failure.
z Shown here is a summary of
the top 1000 (out of 10,000) clients when sorted by
number of consecutive days of failure
z We collected 11 days of data
z Now things are getting ugly
31 10 5 11 15 9 32 8 57 7 108 6 258 5 499 4 Count Consecutive Failures
Partial Backups
z Definition: a backup that
backs up some, but not all of the files it’s supposed to
z Valuable resources are
wasted
• Backing up database files in addition to using agent
• Constantly changing log files that no one cares about
z Important data gets missed
• New databases
• Applications that lock files with exclusive read lock!
z If you’re overlooking your
partials, you could be in for another surprise!
z Either exclude it or figure out
how to back it up properly
Partial backups mean partial
restores. People don’t like partial restores!
Success Rate Lessons
z Success rate isn’t everything
z Unless you’re 100% successful, you have to look beyond it
z Different levels
• By backup server
• By client
• By consecutive failures
z Don’t forget partial backups
z Consecutive failures is the single most surprising section of our
backup assessments
z The best way to gather this level of detail is a commercial data
protection management tool. If the tool is free, you’re getting
what you paid for.
z I can’t imagine maintaining a reasonably sized backup
Tape Utilization
z Again, let’s look at
a large customer’s data
z 535,286 tapes
with data on them
z Approximately
64% utilized
Real Money
z 1% increase in utilization = 5352 fewer tapes for this
customer
z That’s real money!
• $133,800 if LTO-1
• $187320 if LTO-3
• $535,200 if LTO-4
z A 10% increase in utilization could save between $1.3M
and $5.3M on media alone
z Further savings in tape library size, Offsite Vaulting
contract
Increasing Tape Usage
z All: Reduce the number of pools – especially for
offsite tape
z NBU: Do not allow multiple retention periods
per tape
z NW: Use Full/Full pools and expire
Non-Fulls sooner than Non-Fulls
z NBU/NW: Minimize number of MPX settings
z TSM: Use collocation groups instead of
node-level collocation. Spend what you need to in order to get expiration & reclamation done. Start reclamation of emptiest tapes first by slowly lowering your reclamation threshold.
Tape Drive Utilization
z Again, we must look at an individual customer
for this level of detail
z Stats from this customer
• Backup assessment of >100 locations
• 285 backup servers
• 777,422 Backup Jobs
• 71% Success Rate
Lots of tape drives!
z This customer has
deployed hundreds of tape drives over hundreds of locations z 971 tape drives • DLT 7000 • DLT 8000 • LTO-1 • LTO-2 • LTO-3
More is less!
0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%DLT-7000 DLT-8000 LTO-1 LTO-2 LTO-3
z As tape drives get faster and faster, they are getting less
0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%
DLT-7000 DLT-8000 LTO-1 LTO-2 LTO-3
More is less!
z They weren’t streaming their
5 MB/s DLTs!
z LTO-3 =~ 40-150 MB/s
w/their compression ratio.
z They’re getting 13 MB/s
z Yes, faster drives are variable
speed, but they’re not like a CVT. They’re a multi-speed bike.
z Ever tried to peddle a
multi-speed bike up a hill in a gear that was too high?
z That’s your tape drive when
Here’s the problem
z The servers can’t get the data fast enough
z The spikes are LAN-free servers. Few others are >15-20
MB/s
z (For readability, 4 of the 285 servers were removed from
this graph, >299 MB/s) 0 20 40 60 80 100 120 140 160 180
More Real Money
z What if average server throughput was increased from 20
MB/s to 60 MB/s (easily done with GbE)?
z Reducing server count by 185 (100 locations) would save
this customer from $1M to $20M.
z Could increase even further with 10GbE
z What if average tape drive utilization was increased from
20% to 60% (easy)?
z Reducing tape drives count by 600 would save them from
$6M to $18M on tape drives alone.
Get Better Plumbing
z Move what backups you can off the LAN
• Use LAN-free backups
• Use virtual full backups
z Increase LAN throughput any way you can
• TCP Offload Engine
• Updated TCP/IP stacks (e.g. Solaris 10)
• Jumbo Frames
• 10 GbE (600 MB/s+) backup network
Don’t buy another backup server, build a network for the
ones you have!
Backbone’s probably not ready, so build your own -- just
like we did before Fibre Channel and GbE were ready
10GbE switch with 24 GbE ports & 2 10GbE ports is
<$4,000. Even if you had to buy 24 $1K NICs, that’s under $25,000, which is less than the cost of most backup
Get a Better Toilet
z Tape can be great if you keep it happy, and
that’s getting harder every day
z Using disk as an intermediary staging device to
tape can make it much easier to stream your tape drive
z Storing all onsite backups on disk is now more
possible and affordable than ever before
z Before you replace your existing tape library
with yet another tape library, please seek
independent advice on the cost of alternative solutions.
Important NetBackup Features
z Synthetic Full/Cumulative Backups
• Possible to adopt an incremental-forever approach for
filesystem backups
z Flashbackup – for the million-file problem
z SharedDisk (formerly SSO for disk)
z Enhanced disk backups (6.5.1)
z Storage lifecycle policies (6.5.1)
Important TSM Features
z Expiration & Reclamation
• Some people turn them off or cripple them
z Collocation Groups
• Minimize number of tapes a given server will be on
z Active Data Pools
1. FILE type DISK pool or sequential TAPE pool specifying
pooltype=ACTIVEDATA
2. Update node's domain(s) specifying
ACTIVEDESTINATION=<active-data-pool>
Important NetWorker Features
z Max Sessions (7.3+)
• Hard limit to the number of sessions to a device
z Usage of multiple groups
• Previous versions did not handle multiple groups well
• Current versions allow you to specify whatever makes sense
for you
z Saveset consolidation
• Possible to adopt an incremental-forever approach for
filesystem backups
Using Disk
Disk Backup Targets
z Easier to stream tape drives when copying from
backups sent to disk
z A good D2D2T system should easily be able to
stream any tape drive (not all can do this)
z One reason is that randomly distributed data on
source disks is serialized on backup disks
z Recoveries are also easier/faster if at least one
copy of all backups is left on disk
All your friends are doing it!
Yes No z 2007 survey of 163 respondents by the Enterprise Strategy Group z 64% of respondentssaid they will implement disk backup by end of
2008
z 33% of respondents
believed that
deduplication was the key to making this
Disk Staging
z Backup to disk, copy/migrate to tape
z All else is the same
z Requires enough disk for one night’s backups (e.g. 14%: 1/28th
or 4% for full, 10% for incrementals if you do full backup once a month and incrementals daily)
z Helps backups, not restores
z Restores still come from tape
z Still requires shipping tape
z “Those who cannot remember the past are
condemned to repeat it.”*
Disk Backups
z Store all onsite backups on disk
z Offsite backups can be disk or tape
z Requires Bigger shift in thinking & procedures
z Operational restores come from disk
z Requires enough disk to hold all backups (e.g. 2000%:
400% for 4 monthly fulls, 100 days of 15% incrementals = 1000%, 15 10% differentials = 150%)
z Requires deduplication to be as affordable as tape – next
R
Reessoouurrcceess ffrroomm oouurr ssppoonnssoorrss
q Whitepaper: Deduplication Storage for Nearline Applications
q Best Practices Guide: Backup and Recovery for Microsoft Exchange Best Practices with Data Domain
q Storage Research Report: Why Deduplication Technology is Causing a Paradigm Shift in Storage Tiering
q The Growing Importance of Data De-Duplication
q ESG – Understanding the Power of Data De-Duplication
q Cool Vendors in Data Protection
q Regulatory Compliance: How Digital Data Protection Helps
q Data Protection and Recovery – The Why, The How, and Who to Go To
R
Reessoouurrcceess ffrroomm oouurr ssppoonnssoorrss
q Download this eGuide, featuring articles from Storage magazine to learn how data deduplication works and how its products differ.
q Download this Podcast for an insightful Q&A session with Curtis Preston, Vice President of Data Protection Services at GlassHouse Technologies to learn all about data deduplication.
q Comparing Deduplication Approaches: Technology Considerations for Enterprise Environments
q The Forrester Wave: Enterprise Open Systems Virtual Tape Libraries, Q12008
q TCO Comparison Report: Reducing Costs in the Data Center with Deduplication
q Overview: Backup & Recovery Top 10 Reasons to Upgrade
q Symantec Backup Exec System Recovery 8: The Gold Standard in Complete Windows System Recovery
D
Doonn’’tt ffoorrggeett ttoo ddoowwnnllooaadd tthhee ootthheerr pprreesseennttaattiioonnss ffrroomm NNEEXXTTGGEENN BBAACCKKUUPP SSCCHHOOOOLL
p
prreesseennttaattiioonn ttwwooDDeedduupplliiccttiioonn
This session delves into the most talked about new technology in years: deduplication. Curtis will explain the basics of deduplication, why it should work for you and how it should work for you. Learn the difference between inline and post-process dedupe, forward and reverse referencing, hashing and delta differentials and source and target dedupe. Discuss which types are most appro-priate for which types of data centers, and learn how deduplication affects (or doesn’t affect) the most impor-tant thing of all: restores.
p
prreesseennttaattiioonn tthhrreeeePPrrootteeccttiinngg SSttoorreedd DDaattaa,, BBaacckkiinngg UUpp V
Viirrttuuaall IInnffrraassttrruuccttuurree && RReemmoottee DDaattaa
This session will start by explaining the challenges of backing up and recovering virtual machines residing in a virtual infrastructure such as VMware, Microsoft’s Virtual Server or Virtual Iron. Curtis will cover the pros and cons of several backup techniques that can be used to back up any virtual machines, including VM-based backup and console-based backups. Curtis will conclude this hour by explaining several options that are specific to VMware, including VMware Consolidated Backup (VCB) and com-mercial tools aimed at this market. This session will ex-amine the various threats to your stored data—including your SAN, your backup system, your people and your tape—and what to do to protect from each type of threat.