Disaster Recovery Disaster Recovery Best Practices
WWT Educational Webcast WWT Educational Webcast
Ed Levens David L. Jones
World Wide Technology EMC
Questions are Encouraged
You can ask questions during the q g
presentation by using the link provided
in the Webcast Viewer.
Your Success Drives Ours
□ Relentless Focus on People, Process & Partnerships
□ Strong Partner Relationships
□ Over 1,000 Talented Employees
□ Proven Processes
□ Nearly $3 Billion in Revenues
□ Strong Credit Line - $350MM + Key Contract Vehicles: VHA HPG
□ Key Contract Vehicles: VHA, HPG
ITES-2H, GSA, SEWP
Our Focus
Technology Solution
Unified
Communications
Integrated voice, video and data networks can
lower costs and provide employees with productivity benefits.
Security Adaptive threat response that stops network threats before they stop your business .
Mobility Maintain your competitive advantage through the freedom and flexibility of wireless networks.
Data Center Intelligent storage architectures can help reduce
expenses; increase agility for changing priorities;
Disaster Recovery Best Disaster Recovery Best Practices
David L. Jones EMC
EMC
Agenda Agenda
Today's Reality Today s Reality
IT Business Continuance and Disaster Recovery Considerations
Technology Choices
EMC RecoverPoint
Questions?
Unfortunately disasters do happen
Unfortunately, disasters do happen…
Unfortunately disasters do happen Unfortunately, disasters do happen…
Of all the organizations surveyed…
55% had an incident that disabled their primary
d t t
data center
– 60% of these had a regional backup site that was also disabled by the incident
When systems go down the losses add up
When systems go down, the losses add up
Types of Disasters Types of Disasters
Type of Disaster Example
Nature / Man-Made Katrina / 9/11
S / /
Sudden / Time to Prepare Earthquake / Hurricane
Building / Local Area / Region Fire / Power Outage / Flood
Most Frequent Impacts to IT Availability Most Frequent Impacts to IT Availability
Disasters represent a fraction of Environmental issues
Server 30%
Application Software
30%
30%
40%
Client Application Software
5%
Network S/W
5% 1 % 15%
5%
Dilbert Does Disaster recovery …
Dilbert Does Disaster recovery …
Definitions Definitions
Business continuance / COOP describes the processes and procedures an p p organization puts in place to ensure that essential functions can continue during and after a disaster
Disaster recovery is the process, policies and procedures related to y p , p p
preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster
High availability is a system design protocol and associated implementation g y y g p p
that ensures a certain absolute degree of operational continuity during a
given measurement period
Continuity of Operations Policy (COOP) Continuity of Operations Policy (COOP)
It i th li f th U it d St t t h i l
It is the policy of the United States to have in place a
comprehensive and effective program to ensure continuity of essential Federal functions under all circumstances.
As a baseline of preparedness for the full range of potential emergencies, all Federal agencies shall have in place a viable COOP capability which ensures the performance of their
essential functions during any emergency or situation that may
essential functions during any emergency or situation that may
disrupt normal operations.
Agenda Agenda
Today's Reality Today s Reality
IT Business Continuance and Disaster Recovery Considerations
Technology Choices
EMC RecoverPoint
Questions?
Business Continuance – EMC / WWT Approach
Build on our understanding of our customers, their business / mission, and their critical processes and objecti es
Business Continuance EMC / WWT Approach
their critical processes and objectives
Capitalize on our long pedigree in designing, building and managing business/mission-critical systems for the Data Center
Technology gy
Business
Continuance
IT Considerations IT Considerations
Management buy in and commitment is critical g y
Know the regulations specific to your agency or organization
Conduct a risk assessment and identify critical priorities
Determine response for different disaster scenarios
Establish clearly defined roles & responsibilities for personnel
E t bli h ff ti i ti h l
Establish effective communication channels
Maintain necessary resources, tools, and supplies
Testing! Testing! and more Testing! Testing! Testing! and more Testing!
Disaster recovery must be included as part of every process
IT Considerations
Disaster recovery must become part of the IT mind set not an after y p thought
High availability and disaster recovery go hand in hand
Define Architectures that build disaster recovery in from the beginning
Define Architectures that build disaster recovery in from the beginning
Application Development
Infrastructure Design
QA QC Test and Development
QA, QC, Test and Development
Make use of industry recognized processes and architectures
ITIL, MOF, MSA / WSSRA, etc…
Recovery of applications without user interruption is nirvana but
IT Considerations IT Considerations
Recovery Point Objective (RPO) – The last saved data that the y j ( ) restarted application will reflect following the recovery. Also, a measure of the amount of time for which work may be lost in the event of an unplanned outage at the primary site.
P i d t b k ti di k t di k li ti
Period tape backup vs. continuous disk-to-disk replication
Synchronous vs. Asynchronous
Recovery Time Objective (RTO) - The time that will pass before an infrastructure is available In order to reduce RTO data must be infrastructure is available. In order to reduce RTO, data must be online and available at another site.
Distance – Data must be recovered on undamaged hardware outside
the disaster zone Required distance between primary and recovery
the disaster zone. Required distance between primary and recovery
sites should be based on likely regional threats.
Agenda Agenda
Today's Reality Today s Reality
IT Business Continuance and Disaster Recovery Considerations
Technology Choices
EMC RecoverPoint
Questions?
Business Requirements should Drive T h l O ti
Technology Options
Business C
Infrastructure Alt ti
Considerations Alternatives
RTO Cold Site RTO=Days
Warm Site RPO
P t ti GAP Isolation
Active Active Hot Site
Protection GAP Active-Active
RTO=Zero
Data Center Design and Architecture Data Center Design and Architecture
Data Center design should be a high priority to ensure all the aspects of Data Center design should be a high priority to ensure all the aspects of power, cooling, access and security have been core to the design
The distance between data centers will change the options that you have for the deployment of a disaster recovery strategy for all the services IT for the deployment of a disaster recovery strategy for all the services IT provides
Cold Site, Hot Site, Bunkers, Fully Active / Active
This is business decision first
This is business decision first
Make effective use of and leverage your existing facilities
Leveraging disaster recovery assets can provide maximum value BUT can also extend time to recovery or RTO y
This choice will impact the technology decisions and options that are
Reference Architectures
Reference Architectures
Virtual and Physical Considerations Virtual and Physical Considerations
Server, Storage and Network Virtualization cam maximize resources and , g streamline operations and disaster recovery
Server virtualization is mature and there are many choices
VMware
Microsoft HyperV
Citrix / Zen
Cisco “California”
Storage virtualization is mature but not as widely deployed
Storage virtualization is mature but not as widely deployed
EMC Invista
HDS Array based
NetApp VSeries
O h Other
Network virtualization is a developing technology
Virtual and Physical Considerations Virtual and Physical Considerations
Disaster recovery considerations for virtualized environments y
Physical to Virtual
Virtual to Physical
Physical to Physical
Virtual to Virtual Virtual to Virtual
Consolidated disaster recovery using virtualization technologies can maximize resources
“DR in a box”
Maximum utilization of disaster recovery resources
Virtualization can present management challenges
Virtual to Physical Mappings
M i f id i ibili
Management infrastructure must provide visibility
Server
Understanding Data Consistency Understanding Data Consistency
Applications and data are Order Entry CRM Applications and data are
interrelated (Federated)
All data movement must be stopped/started at the same
DB
stopped/started at the same point in time
To restart applications you must have all the data—not parts of it
DB DB
have all the data not parts of it Recovery requires dependent- write consistency across all volumes and systems
SCM
volumes and systems
Infrastructure Services Infrastructure Services
Without Disaster recovery enabled infrastructure most other Disaster y recovery efforts will fail
Core services like Networks, DNS, Directory Services, etc… are required for all of the other process that run in the Data Center
VPN and remote access services can be your best ally in the event of disaster and must be core to your plans
Management infrastructure will play a role in conducting root cause g y g analysis ONLY if it is available
In most cases infrastructure services are COTS based and have been designed to provide availability using a geographically distributed scale out model
out model
Vendor selection and partnership is key in this area because most
Applications Applications
Applications are very rarely standalone Applications are very rarely standalone
Multi-tired applications (WEB, App Server, Database) will almost always require all tiers to operate
Most applications will not work if the required infrastructure is not also part of the plan
Data consistency between the tiers makes recovery much easier and more timely
N t k b d S ft b d l d b l i i th t
Network based or Software based load balancing is the most common method for making WEB and Application tiers resilient
Applications that require persistent data storage may have additional
i t
requirements
Applications – An example via email pp p
Email IS NOT a standalone application Email IS NOT a standalone application
An enterprise class email implementation will usually consist of at least the following:
Main email data servers
Main email data servers
SMTP (Inbound and outbound mail)
Integration point with a directory server
Blackberry Blueberry Strawberry you get the point Blackberry, Blueberry, Strawberry, you get the point…
WEB based email front end
Real Time Collaboration – SharePoint, DB system, IM, etc…
Multiple Infrastructure touch points – DNS, WINS, VPN, etc… p p , , ,
External Vendors – Cellular provider
Databases Databases
Different types of databases require different kinds of disaster recovery yp q y solutions
Read only / Data warehousing
Transactional
Most common types of disaster recovery solutions in the database space are
Oracle GRID/RAC based or scale out implementations - Clustering
Storage replication with application tie in
Storage replication with application tie in
Data Base level replication
Most disaster recovery solutions for databases require a tight integration with the application tier solution in order to ensure transaction level
with the application tier solution in order to ensure transaction level
recovery
Storage / Data Protection
Daily backup Daily recovery points—from tape or disk
Storage / Data Protection
y p
Snapshots
Any point in time
Significant point in time
Daily recovery points from tape or disk More frequent disk-based recovery points All recovery points
Significant point in time
Database checkpoint
Pre-app patch
Post-app patch
Database checkpoint
Quarterly close
Any user- configurable event
Significant points in time Any point in time
Continuous Data
Protection in time
Snapshot
Storage / Data Protection Storage / Data Protection
Creating remote and local copies of your data is a must for disaster C eat g e ote a d oca cop es o you data s a ust o d saste recovery
The replication of storage data is a complex process that requires
knowledge of what is being stored, detailed performance analysis and
t k i t l i
network impact analysis
Synchronous vs. Asynchronous
It’s all about distance
Adaptive solutions can provide dynamic RPO
Application level consistency is paramount
Many types of storage replication technologies exist
Array Based – Usually locks you into storage array choices
Storage / Data Protection Storage / Data Protection
A data replication solution that allows the flexibility of applying different p y pp y g RPO policies to both storage and in turn applications is key
Ability to prioritize RPO application by application
Create tiered model based on business requirements
Data Back p is here to sta and ha ing a rob st back p AND restore
Data Backup is here to stay and having a robust backup AND restore environment is crucial
Tape
Backup to Disk (VTL & CDP) Backup to Disk (VTL & CDP)
Offsite storage of backup data
Data Security
Date protection can reside on many tiers consolidating it’s management
Date protection can reside on many tiers consolidating it s management
is key
Vendor Choice is Critical
Disaster recovery IS complex Disaster recovery IS complex
Disaster recovery spans internal IT organizations and specific technology disciplines
Management by In is critical for success
Disaster recovery involves many internal and external partners
Partnering with vendors is key as are the partnerships between your
Partnering with vendors is key as are the partnerships between your
vendors!
Agenda Agenda
Today's Reality Today s Reality
IT Business Continuance and Disaster Recovery Considerations
Technology Choices
EMC RecoverPoint
Questions?
Data Replication Pain Points in Heterogeneous E i
Environments
Application platform
Application- consistent
Local site Remote site
Application response time
Oracle Exchange SQL Oracle Exchange SQL
Application platform support
consistent recovery
Corruption protection
SAN SAN
SAN
Disaster-recovery testing
Communications Existing cost
infrastructure
cost
GDA1
Slide 35
GDA1
Added host platform support to graphic in red, change back to normal, updated title.
Content: please adjust build as appropriate -- all the boxes should flow in with a slight delay between each.
Gary Archer, 1/9/2008
RecoverPoint Concurrent Local and Remote (CLR) D t P t ti
(CLR) Data Protection
PRODUCTION SITE DISASTER RECOVERY SITE PRODUCTION SITE DISASTER RECOVERY SITE
Cluster Passive Node Cluster
Active
Node RecoverPoint
appliances
Tape Backup Manager Standby
Disaster Recovery Server
SAN SAN/WAN SAN
Replication Data Flow
Tape Library
RecoverPoint Replication Services Local
Journal Storage Groups
and Logs
Remote Journal
Replicated Storage Groups and Logs
Performance architecture True CDP data protection for applications
–Out-of-band design leveraging intelligent host and fabric interfaces*
–Supports CLARiiON write splitting on CX3 and CX4 arrays
p pp
–All writes stored in Journal with application bookmarks for recovery –Supports Microsoft Volume Shadowcopy Service (VSS) and VDI APIs
Journaling for Application-Aware Recovery Journaling for Application Aware Recovery
Journal Includes Data Plus Metadata Time/date
– Identifies the time image was saved
Bookmarks:
Bookmarks:
– System-generated group bookmarks
e.g., Volume Shadowcopy Service (VSS) backup
– User-generated bookmarks – Other EMC product bookmarks p
EMC Replication Manager
– System-event-generated bookmarks – Microsoft SQL Server
Microsoft Virtual Device Interface (VDI) operations
Mi ft E h – Microsoft Exchange
Microsoft VSS
Grouping for a Consistent View Grouping for a Consistent View
Allows application recovery to be pp y tiered by service level
– Multiple volumes per group
– Mixed recovery point objectives within
same infrastructure OE Group 1 CRR
Provides independent replication controls
– Recover by group, locally or remotely St t/ t b
Group 2 CRR
CRM CDP
– Start/stop by group
Enables grouping of optimization
– Importance – Resource usage
Group 3
E-mail CRR
CDP CRR SCM
g
– Recovery point and recovery time
objectives
Grouping for Federated Environments Grouping for Federated Environments
Each tier has different service level 1: Linux (Web OE)
agreements
– Consistency groups per tier – Operational recovery of tier
P ll l i t ti
1: Linux (Web OE)
Consistency group
2: Windows (CRM)
Parallel consistency across tiers
– Federated environments
– Recover to a known point for all applications
Di t f ti li ti
– Disaster recovery for tier or application – Spans operating systems, applications,
storage, and servers
Enables advanced functions
Consistency group
– Full environment cloning
– Application upgrade testing 3: UNIX (SCM, Financials…)
RecoverPoint/Cluster Enabler (R P i t/CE)
(RecoverPoint/CE)
Each named cluster group’s g p associated devices reside in a single RecoverPoint consistency group of the same name
RecoverPoint RecoverPoint
WAN
Supports Microsoft Cluster Server on Windows Server 2003 and Microsoft Failover Cluster on Windows Server
2008 E t i d
2008 Enterprise and Datacenter Editions
File Share Witness with RecoverPoint/CE
installed
CG1: Devices for
Cl t G 1
Cluster Group1
VMware Infrastructure 3.5—Value and
I ti
Consolidate and t i
Innovations
3 U d t contain servers
Optimize your infrastructure
Manage and
Management and Automation
Infrastructure O ti i ti
Business C ti it
Desktop
M t
Software Lif l
3
Converter +
VDI ACE
Lab Manager Workstation Site
Recovery Manager Update
Manager
Manage and secure desktops
Maximize continuity and
uptime
Optimization Continuity Management Lifecycle
Virtual
2
VMotion High Availability +
Consolidated Backup Distributed
Resource Scheduler (DRS)
Storage VMotion DPM
uptime Automate your
virtual labs
Virtual Infrastructure
Resource Management
Availability VirtualCenter + Mobility Security Scheduler (DRS)
Virtualization
1 VMware Virtual Machine File System
Virtual SMP
VMware Site Recovery Manager Integration VMware Site Recovery Manager Integration
Simplifies and automates disaster recovery
workflows PRODUCTION RECOVERY
– Setup, testing, and failover
Makes disaster recovery a property of the virtual machine (VMware Distributed Resource
Scheduler and High Availability)
APP OS
APP OS
APP OS
APP OS
APP OS
APP OS
APP OS
APP OS