tel: +43-1-5248737-40 web: www.armstrongconsulting.com email: [email protected]
WHITE PAPER
Business Continuity Planning
or how to take advantage of new technologies to improve business continuity and disaster recovery
_________________________________________________________________________
Document ID: 127848 Created: 27. März 2008
Version: 1.0
Author: Dr. Rainer Sabelka Comments:
Changes:
From
Table of contents
1. MANAGEMENT SUMMARY ... 3
2. BUSINESS CONTINUITY PLANNING ... 3
2.1 Introduction or “the problem with tapes” ... 3
2.2 Measuring the quality of a BCP strategy ... 4
2.3 The Strategy ... 4
2.4 Backup versioning ... 5
2.5 Project Organization ... 5
2.6 Project Costs ... 5
3. ABOUT THE AUTHOR ... 6
1. Management Summary
BCP (Business Continuity Planning) and the associated field of DR (Disaster Recovery) are
recognized by most companies today as top-management responsibilities. The ability for a company to continue to serve its customers in the face of problems (whether minor issues or major disasters) is essential to its long-term survival.
Recent compliance measures like Sarbanes-Oxley and Basel II have reinforced the priority of establishing a functioning business continuity plan.
Traditionally BCP has focused on data backup, which is still necessary, but no longer sufficient. Having only a backup of your data is of course better than nothing, but your business processes depend on more than just that data – the systems (servers, applications and configurations) which use it are also required to provide continuity.
So, for effective BCP, you need to ensure that both the data and the systems which can use it remain available, even in the event of disaster. Recent advances in technology now make it possible to keep real-time or near-real-time backups of both your data and your systems at a disaster-recovery site. In the event of disaster, these systems are immediately available to provide your company with
uninterrupted business continuity.
The two new key technologies which we take advantage of are virtualization (for capturing whole systems) and economical high-bandwidth networks (for transporting those systems offsite).
Armstrong Consulting can support your company during the definition of your BCP requirements, the development of your plan and putting the plan into practice. In particular, we can show you how to accomplish best-practices BCP within an acceptable budget.
2.
Business Continuity Planning
As introduced in the management summary, the cornerstones of our BCP strategy are virtualization and broadband networks, to capture running systems complete with data, servers and configurations and replicate them to an offsite disaster-recovery site. We will deploy mostly open-source software, because openness and avoidance of vendor lock-in are also key components of continuity. We will show you how to monitor the availability of your disaster recovery infrastructure in real-time. And lastly, we will show you how you can not only provide near-real-time backups of your systems, but also maintain multiple, versioned snapshots of your systems which can be stepped back to if required.
2.1 Introduction or “the problem with tapes”
If you’ve got this far, we may not need to explain the problem with backup to tape. But, for completeness, let’s look quickly at why tapes are no longer appropriate for backups.
• Firstly, the capacity of tapes has not kept up with the capacity of disks, with the result that to backup the disks in your organization, you probably already need a tape robot, which is large, expensive and proprietary.
• Secondly, the growth of the disk capacity of your organization is accelerating – tape based systems will have a hard time to keep up with this growth.
“Two out of five enterprises which experience a disaster will go out of business within five years. Enterprises can improve these odds – but only if they take the necessary measures before and after the disaster”
• Thirdly, tapes are not particularly reliable – it’s a fragile medium subject to environmental damage –you can never be sure that you’ll be able to restore its contents after its been stored for any length of time.
• Fourthly and perhaps most importantly, tapes require human interaction to get them offsite. Only an offsite backup is a suitable for a disaster-recovery backup. Unfortunately, humans are expensive, prone to errors and often leave for vacations so we’d like to keep them out of our BCP processes wherever possible.
The only real alternative to tape is backup to disk. Luckily, that has become fast, reliable and cheaper than tape, so that’s what we’ll use.
2.2 Measuring the quality of a BCP strategy
Two metrics are commonly used in measuring the quality of a BCP strategy. One is the RPO, or Recovery Point Objective, which measures the freshness of the backup you have to work with after a disaster. The other is the RTO, or Recovery Time Objective, which measures the amount of time it takes before your company can function again after a disaster. The goal of a BCP strategy is to make the RPO as recent as possible and keep the RTO as short as possible. In the following sections, we will show you how our strategy of replicating complete, live systems with data, server and configuration produces optimal values for both the RPO and RTO criteria.
2.3 The Strategy
The BCP strategy we employ can be summarized as consisting of the following steps: • Prepare by virtualizing as many of your physical systems as possible
• Backup all systems (virtual and physical) to a local disk array
• Replicate that disk array (via the standard rsync protocol) to an offsite disk array (at the disaster-recovery site).
The above steps can be represented graphically as follows: Mission critical systems and data (i.e. The set of all things which are needed in case of disaster) Physical (non‐virtual) servers VMs (virtualized servers)
Files Databases Other backups Working image (files, VMs, &backups) rsync client and server
Primary Site
Disaster Recovery Site
DR image rsync server Physical Server(s) Boot from Restore from and access data from Rsync protocol synchronizes DR image with working image (bandwidth available = 10Mbps = 108GB uncompressed per 24h day)backup copy rsync
backup (incremental) backup Retention backups Working copy VMs (clones) VMWare Server(s) File copy Continuous and automatic monitoring 2.4 Backup versioning
As with the tape-rotation schemes used in the past, maintaining multiple, time-lapsed versions of backups may be very useful for dealing with a specific category of disasters, namely those where corrupted, changed or accidentally deleted data goes unnoticed until it is reflected in the backup copies. Note that as backups approach real-time (and that’s the goal), this problem becomes more likely to occur –accidentally changed data will be immediately reflected in the back-up copy. To deal with this, we provide versioning on the backups. Typically, we maintain up to 8 backup snapshots, one current copy, one day-old copy, one 2-day old copy, one 4-day old copy, one 8-day old copy etc). Using copy-on-write technology, these snapshots are stored in an optimal fashion (only when a data block changes between versions, does it actually consume space in the versions).
2.5 Project Organization
BCP is usually considered a business-critical, in-house project. As external consultants, our role is restricted to supporting your IT staff with our experience during the planning, analysis and
implementation phases. We can provide specific skills (such as specifying and setting up VMWare servers, P2V (the virtualization of physical resources), installation and configuration of the storage servers and networks). We have a set of tools which we recommend (for instance, VMWare for virtualization, open-source tools for storage, replication and system monitoring), but in many cases, our customers have standards which they wish to adhere to and these can be integrated into the BCP processes.
Our experience has shown that the BCP systems currently used by larger companies are often very expensive (tape robots, redundant disaster-recovery infrastructures etc.) to maintain. This means that a BCP strategy described in this document (based on commodity hardware and largely open-source software) can often bring immediate cost advantages and the project costs can be amortized within two years.
The cost components are as follows:
• Project staff costs. These costs are composed of your internal costs for the staff members involved and the cost of external consultants like Armstrong Consulting. The external costs depend on the level of involvement of the consultants in the phases of the project (planning, implementing, monitoring and maintenance).
• Additional server hardware. Since virtualization allows you to consolidate physical servers, you may actually reduce costs during the virtualization process. However, some additional servers (or server upgrades) will be initially be required for your primary site and for the disaster recovery site. Virtualization server requirements differ from standard physical server requirements (more memory, more processors, more redundant components etc).
• Additional storage hardware. Since this strategy depends on large disk arrays instead of tapes for backup, you may need to purchase additional storage arrays. The emphasis of these arrays is on capacity, reliability and sequential access speed, so large RAID-6 arrays of high-capacity SATA drives can be deployed economically (cost: approx. EUR 8000 for 20TB of RAID-6 storage from Dell or HP). One array will be required on site and one offsite (disaster recovery site).
• Additional software:
o Virtualization: depending on how you choose to implement virtualization, you may require licenses for software such as VMWare Virtual Infrastructure. A number of open-source or free virtualization solutions may also be deployed effectively. The rest of the software used (for replication, versioning) is all based on open-source and has no additional license costs.
o Backup: for backup of non-virtualized resources (such as databases, file systems, workstations etc), you will need backup software. You may already have licenses in place which you can continue to use (if your software is backup-to-disk capable), or you may require new backup software.
• Network costs (installation and maintenance of a VPN network for continuous offsite transfer of backed up systems). Based on current network economics, the capacity of this network will need to be between 4Mbps and 10Mbps and will cost between EUR 200 and EUR 1500 per month.
3.
About the author
Dr. Rainer Sabelka holds a Ph.D. in Softwaretechnik from the Technische Universität Wien. Additionally he has 15 years of industry experience developing software and advising international corporations such as Coca-Cola on infrastructure planning.
4. Contacts
Dr. Sabelka and his team at Armstrong Consulting would be pleased to help you analyze your BCP requirements, create a BCP plan and support you with the successful implementation of that plan. You can contact him at:
Armstrong Consulting GesmbH Zieglergasse 1/14, 1070 Wien. Tel: +43-1-5248737-40