Best Practices. World Wide Technology

(1)

Disaster Recovery Disaster Recovery Best Practices

WWT Educational Webcast WWT Educational Webcast

Ed Levens David L. Jones

World Wide Technology EMC

(2)

Questions are Encouraged

You can ask questions during the q g

presentation by using the link provided

in the Webcast Viewer.

(3)

Your Success Drives Ours

□ Relentless Focus on People, Process & Partnerships

□ Strong Partner Relationships

□ Over 1,000 Talented Employees

□ Proven Processes

□ Nearly $3 Billion in Revenues

□ Strong Credit Line - $350MM + Key Contract Vehicles: VHA HPG

□ Key Contract Vehicles: VHA, HPG

ITES-2H, GSA, SEWP

(4)

Our Focus

Technology Solution

Unified

Communications

Integrated voice, video and data networks can

lower costs and provide employees with productivity benefits.

Security Adaptive threat response that stops network threats before they stop your business .

Mobility Maintain your competitive advantage through the freedom and flexibility of wireless networks.

Data Center Intelligent storage architectures can help reduce

expenses; increase agility for changing priorities;

(5)

Disaster Recovery Best Disaster Recovery Best Practices

David L. Jones EMC

EMC

(6)

Agenda Agenda

Today's Reality Today s Reality

IT Business Continuance and Disaster Recovery Considerations

Technology Choices

EMC RecoverPoint

Questions?

(7)

Unfortunately disasters do happen

Unfortunately, disasters do happen…

(8)

Unfortunately disasters do happen Unfortunately, disasters do happen…

Of all the organizations surveyed…

55% had an incident that disabled their primary

d t t

data center

– 60% of these had a regional backup site that was also disabled by the incident

When systems go down the losses add up

When systems go down, the losses add up

(9)

Types of Disasters Types of Disasters

Type of Disaster Example

Nature / Man-Made Katrina / 9/11

S / /

Sudden / Time to Prepare Earthquake / Hurricane

Building / Local Area / Region Fire / Power Outage / Flood

(10)

Most Frequent Impacts to IT Availability Most Frequent Impacts to IT Availability

Disasters represent a fraction of Environmental issues

Server 30%

Application Software

30%

40%

Client Application Software

5%

Network S/W

5% 1 % 15%

5%

(11)

Dilbert Does Disaster recovery …

(12)

Definitions Definitions

Business continuance / COOP describes the processes and procedures an p p organization puts in place to ensure that essential functions can continue during and after a disaster

Disaster recovery is the process, policies and procedures related to y p , p p

preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster

High availability is a system design protocol and associated implementation g y y g p p

that ensures a certain absolute degree of operational continuity during a

given measurement period

(13)

Continuity of Operations Policy (COOP) Continuity of Operations Policy (COOP)

It i th li f th U it d St t t h i l

It is the policy of the United States to have in place a

comprehensive and effective program to ensure continuity of essential Federal functions under all circumstances.

As a baseline of preparedness for the full range of potential emergencies, all Federal agencies shall have in place a viable COOP capability which ensures the performance of their

essential functions during any emergency or situation that may

disrupt normal operations.

(14)

Agenda Agenda

Today's Reality Today s Reality

IT Business Continuance and Disaster Recovery Considerations

Technology Choices

EMC RecoverPoint

Questions?

(15)

Business Continuance – EMC / WWT Approach

Build on our understanding of our customers, their business / mission, and their critical processes and objecti es

Business Continuance EMC / WWT Approach

their critical processes and objectives

Capitalize on our long pedigree in designing, building and managing business/mission-critical systems for the Data Center

Technology gy

Business

Continuance

(16)

IT Considerations IT Considerations

Management buy in and commitment is critical g y

Know the regulations specific to your agency or organization

Conduct a risk assessment and identify critical priorities

Determine response for different disaster scenarios

Establish clearly defined roles & responsibilities for personnel

E t bli h ff ti i ti h l

Establish effective communication channels

Maintain necessary resources, tools, and supplies

Testing! Testing! and more Testing! Testing! Testing! and more Testing!

Disaster recovery must be included as part of every process

(17)

IT Considerations

Disaster recovery must become part of the IT mind set not an after y p thought

High availability and disaster recovery go hand in hand

Define Architectures that build disaster recovery in from the beginning

Application Development

Infrastructure Design

QA QC Test and Development

QA, QC, Test and Development

Make use of industry recognized processes and architectures

ITIL, MOF, MSA / WSSRA, etc…

Recovery of applications without user interruption is nirvana but

(18)

IT Considerations IT Considerations

Recovery Point Objective (RPO) – The last saved data that the y j ( ) restarted application will reflect following the recovery. Also, a measure of the amount of time for which work may be lost in the event of an unplanned outage at the primary site.

P i d t b k ti di k t di k li ti

Period tape backup vs. continuous disk-to-disk replication

Synchronous vs. Asynchronous

Recovery Time Objective (RTO) - The time that will pass before an infrastructure is available In order to reduce RTO data must be infrastructure is available. In order to reduce RTO, data must be online and available at another site.

Distance – Data must be recovered on undamaged hardware outside

the disaster zone Required distance between primary and recovery

the disaster zone. Required distance between primary and recovery

sites should be based on likely regional threats.

(19)

Agenda Agenda

Today's Reality Today s Reality

IT Business Continuance and Disaster Recovery Considerations

Technology Choices

EMC RecoverPoint

Questions?

(20)

Business Requirements should Drive T h l O ti

Technology Options

Business C

Infrastructure Alt ti

Considerations Alternatives

RTO Cold Site ^RTO=Days

Warm Site RPO

P t ti GAP Isolation

Active Active Hot Site

Protection GAP Active-Active

RTO=Zero

(21)

Data Center Design and Architecture Data Center Design and Architecture

Data Center design should be a high priority to ensure all the aspects of Data Center design should be a high priority to ensure all the aspects of power, cooling, access and security have been core to the design

The distance between data centers will change the options that you have for the deployment of a disaster recovery strategy for all the services IT for the deployment of a disaster recovery strategy for all the services IT provides

Cold Site, Hot Site, Bunkers, Fully Active / Active

This is business decision first

Make effective use of and leverage your existing facilities

Leveraging disaster recovery assets can provide maximum value BUT can also extend time to recovery or RTO y

This choice will impact the technology decisions and options that are

(22)

Reference Architectures

(23)

Virtual and Physical Considerations Virtual and Physical Considerations

Server, Storage and Network Virtualization cam maximize resources and , g streamline operations and disaster recovery

Server virtualization is mature and there are many choices

VMware

Microsoft HyperV

Citrix / Zen

Cisco “California”

Storage virtualization is mature but not as widely deployed

EMC Invista

HDS Array based

NetApp VSeries

O h Other

Network virtualization is a developing technology

(24)

Virtual and Physical Considerations Virtual and Physical Considerations

Disaster recovery considerations for virtualized environments y

Physical to Virtual

Virtual to Physical

Physical to Physical

Virtual to Virtual Virtual to Virtual

Consolidated disaster recovery using virtualization technologies can maximize resources

“DR in a box”

Maximum utilization of disaster recovery resources

Virtualization can present management challenges

Virtual to Physical Mappings

M i f id i ibili

Management infrastructure must provide visibility

Server

(25)

Understanding Data Consistency Understanding Data Consistency

Applications and data are Order Entry CRM Applications and data are

interrelated (Federated)

All data movement must be stopped/started at the same

DB

stopped/started at the same point in time

To restart applications you must have all the data—not parts of it

DB DB

have all the data not parts of it Recovery requires dependent- write consistency across all volumes and systems

SCM

volumes and systems

(26)

Infrastructure Services Infrastructure Services

Without Disaster recovery enabled infrastructure most other Disaster y recovery efforts will fail

Core services like Networks, DNS, Directory Services, etc… are required for all of the other process that run in the Data Center

VPN and remote access services can be your best ally in the event of disaster and must be core to your plans

Management infrastructure will play a role in conducting root cause g y g analysis ONLY if it is available

In most cases infrastructure services are COTS based and have been designed to provide availability using a geographically distributed scale out model

out model

Vendor selection and partnership is key in this area because most

(27)

Applications Applications

Applications are very rarely standalone Applications are very rarely standalone

Multi-tired applications (WEB, App Server, Database) will almost always require all tiers to operate

Most applications will not work if the required infrastructure is not also part of the plan

Data consistency between the tiers makes recovery much easier and more timely

N t k b d S ft b d l d b l i i th t

Network based or Software based load balancing is the most common method for making WEB and Application tiers resilient

Applications that require persistent data storage may have additional

i t

requirements

(28)

Applications – An example via email pp p

Email IS NOT a standalone application Email IS NOT a standalone application

An enterprise class email implementation will usually consist of at least the following:

Main email data servers

SMTP (Inbound and outbound mail)

Integration point with a directory server

Blackberry Blueberry Strawberry you get the point Blackberry, Blueberry, Strawberry, you get the point…

WEB based email front end

Real Time Collaboration – SharePoint, DB system, IM, etc…

Multiple Infrastructure touch points – DNS, WINS, VPN, etc… p p , , ,

External Vendors – Cellular provider

(29)

Databases Databases

Different types of databases require different kinds of disaster recovery yp q y solutions

Read only / Data warehousing

Transactional

Most common types of disaster recovery solutions in the database space are

Oracle GRID/RAC based or scale out implementations - Clustering

Storage replication with application tie in

Data Base level replication

Most disaster recovery solutions for databases require a tight integration with the application tier solution in order to ensure transaction level

with the application tier solution in order to ensure transaction level

recovery

(30)

Storage / Data Protection

Daily backup Daily recovery points—from tape or disk

Storage / Data Protection

y p

Snapshots

Any point in time

Significant point in time

Daily recovery points from tape or disk More frequent disk-based recovery points All recovery points

Significant point in time

Database checkpoint

Pre-app patch

Post-app patch

Database checkpoint

Quarterly close

Any user- configurable event

Significant points in time Any point in time

Continuous Data

Protection _{in time}

Snapshot

(31)

Storage / Data Protection Storage / Data Protection

Creating remote and local copies of your data is a must for disaster C eat g e ote a d oca cop es o you data s a ust o d saste recovery

The replication of storage data is a complex process that requires

knowledge of what is being stored, detailed performance analysis and

t k i t l i

network impact analysis

Synchronous vs. Asynchronous

It’s all about distance

Adaptive solutions can provide dynamic RPO

Application level consistency is paramount

Many types of storage replication technologies exist

Array Based – Usually locks you into storage array choices

(32)

Storage / Data Protection Storage / Data Protection

A data replication solution that allows the flexibility of applying different p y pp y g RPO policies to both storage and in turn applications is key

Ability to prioritize RPO application by application

Create tiered model based on business requirements

Data Back p is here to sta and ha ing a rob st back p AND restore

Data Backup is here to stay and having a robust backup AND restore environment is crucial

Tape

Backup to Disk (VTL & CDP) Backup to Disk (VTL & CDP)

Offsite storage of backup data

Data Security

Date protection can reside on many tiers consolidating it’s management

Date protection can reside on many tiers consolidating it s management

is key

(33)

Vendor Choice is Critical

Disaster recovery IS complex Disaster recovery IS complex

Disaster recovery spans internal IT organizations and specific technology disciplines

Management by In is critical for success

Disaster recovery involves many internal and external partners

Partnering with vendors is key as are the partnerships between your

vendors!

(34)

Agenda Agenda

Today's Reality Today s Reality

IT Business Continuance and Disaster Recovery Considerations

Technology Choices

EMC RecoverPoint

Questions?

(35)

Data Replication Pain Points in Heterogeneous E i

Environments

Application platform

Application- consistent

Local site Remote site

Application response time

Oracle Exchange SQL Oracle Exchange SQL

Application platform support

consistent recovery

Corruption protection

SAN SAN

SAN

Disaster-recovery testing

Communications Existing cost

infrastructure

cost

GDA1

(36)

Slide 35

GDA1

Added host platform support to graphic in red, change back to normal, updated title.

Content: please adjust build as appropriate -- all the boxes should flow in with a slight delay between each.

Gary Archer, 1/9/2008

(37)

RecoverPoint Concurrent Local and Remote (CLR) D t P t ti

(CLR) Data Protection

PRODUCTION SITE DISASTER RECOVERY SITE PRODUCTION SITE DISASTER RECOVERY SITE

Cluster Passive Node Cluster

Active

Node RecoverPoint

appliances

Tape Backup Manager Standby

Disaster Recovery Server

SAN SAN/WAN SAN

Replication Data Flow

Tape Library

RecoverPoint Replication Services Local

Journal Storage Groups

and Logs

Remote Journal

Replicated Storage Groups and Logs

Performance architecture True CDP data protection for applications

–Out-of-band design leveraging intelligent host and fabric interfaces*

–Supports CLARiiON write splitting on CX3 and CX4 arrays

p pp

–All writes stored in Journal with application bookmarks for recovery –Supports Microsoft Volume Shadowcopy Service (VSS) and VDI APIs

(38)

Journaling for Application-Aware Recovery Journaling for Application Aware Recovery

Journal Includes Data Plus Metadata Time/date

– Identifies the time image was saved

Bookmarks:

– System-generated group bookmarks

e.g., Volume Shadowcopy Service (VSS) backup

– User-generated bookmarks – Other EMC product bookmarks p

EMC Replication Manager

– System-event-generated bookmarks – Microsoft SQL Server

Microsoft Virtual Device Interface (VDI) operations

Mi ft E h – Microsoft Exchange

Microsoft VSS

(39)

Grouping for a Consistent View Grouping for a Consistent View

Allows application recovery to be pp y tiered by service level

– Multiple volumes per group

– Mixed recovery point objectives within

same infrastructure _OE Group 1 _CRR

Provides independent replication controls

– Recover by group, locally or remotely St t/ t b

Group 2 ^CRR

CRM CDP

– Start/stop by group

Enables grouping of optimization

– Importance – Resource usage

Group 3

E-mail CRR

CDP CRR SCM

g

– Recovery point and recovery time

objectives

(40)

Grouping for Federated Environments Grouping for Federated Environments

Each tier has different service level 1: Linux (Web OE)

agreements

– Consistency groups per tier – Operational recovery of tier

P ll l i t ti

1: Linux (Web OE)

Consistency group

2: Windows (CRM)

Parallel consistency across tiers

– Federated environments

– Recover to a known point for all applications

Di t f ti li ti

– Disaster recovery for tier or application – Spans operating systems, applications,

storage, and servers

Enables advanced functions

Consistency group

– Full environment cloning

– Application upgrade testing 3: UNIX (SCM, Financials…)

(41)

RecoverPoint/Cluster Enabler (R P i t/CE)

(RecoverPoint/CE)

Each named cluster group’s g p associated devices reside in a single RecoverPoint consistency group of the same name

RecoverPoint RecoverPoint

WAN

Supports Microsoft Cluster Server on Windows Server 2003 and Microsoft Failover Cluster on Windows Server

2008 E t i d

2008 Enterprise and Datacenter Editions

File Share Witness with RecoverPoint/CE

installed

CG1: Devices for

Cl t G 1

Cluster Group1

(42)

VMware Infrastructure 3.5—Value and

I ti

Consolidate and t i

Innovations

3 ^{U d t} contain servers

Optimize your infrastructure

Manage and

Management and Automation

Infrastructure O ti i ti

Business C ti it

Desktop

M t

Software Lif l

3 Converter +

VDI ACE

Lab Manager Workstation Site

Recovery Manager Update

Manager

Manage and secure desktops

Maximize continuity and

uptime

Optimization Continuity Management Lifecycle

Virtual

2 VMotion High Availability +

Consolidated Backup Distributed

Resource Scheduler (DRS)

Storage VMotion DPM

uptime Automate your

virtual labs

Virtual Infrastructure

Resource Management

Availability VirtualCenter + Mobility Security Scheduler (DRS)

Virtualization

1 VMware Virtual Machine File System

Virtual SMP

(43)

VMware Site Recovery Manager Integration VMware Site Recovery Manager Integration

Simplifies and automates disaster recovery

workflows PRODUCTION RECOVERY

– Setup, testing, and failover

Makes disaster recovery a property of the virtual machine (VMware Distributed Resource

Best Practices. World Wide Technology

Disaster Recovery Disaster Recovery Best Practices

WWT Educational Webcast WWT Educational Webcast

Ed Levens David L. Jones

World Wide Technology EMC

Questions are Encouraged

You can ask questions during the q g

presentation by using the link provided

in the Webcast Viewer.

Your Success Drives Ours

□ Relentless Focus on People, Process & Partnerships

□ Strong Partner Relationships

□ Over 1,000 Talented Employees

□ Proven Processes

□ Nearly $3 Billion in Revenues

□ Strong Credit Line - $350MM + Key Contract Vehicles: VHA HPG

□ Key Contract Vehicles: VHA, HPG

ITES-2H, GSA, SEWP

Our Focus

Technology Solution

Unified

Communications

Integrated voice, video and data networks can

lower costs and provide employees with productivity benefits.

Security Adaptive threat response that stops network threats before they stop your business .

Mobility Maintain your competitive advantage through the freedom and flexibility of wireless networks.

Data Center Intelligent storage architectures can help reduce

expenses; increase agility for changing priorities;

Disaster Recovery Best Disaster Recovery Best Practices

David L. Jones EMC

EMC

Agenda Agenda

 Today's Reality Today s Reality

 IT Business Continuance and Disaster Recovery Considerations

 Technology Choices

 EMC RecoverPoint

 Questions?

Unfortunately disasters do happen

Unfortunately, disasters do happen…

Unfortunately disasters do happen Unfortunately, disasters do happen…

Of all the organizations surveyed…

55% had an incident that disabled their primary

d t t

data center

– 60% of these had a regional backup site that was also disabled by the incident

When systems go down the losses add up

When systems go down, the losses add up

Types of Disasters Types of Disasters

Type of Disaster Example

Nature / Man-Made Katrina / 9/11

S / /

Sudden / Time to Prepare Earthquake / Hurricane

Building / Local Area / Region Fire / Power Outage / Flood

Most Frequent Impacts to IT Availability Most Frequent Impacts to IT Availability

Disasters represent a fraction of Environmental issues

Server 30%

Application Software

30%

30%

40%

Client Application Software

5%

Network S/W

5% 1 % 15%

5%

Dilbert Does Disaster recovery …

Dilbert Does Disaster recovery …

Definitions Definitions

 Business continuance / COOP describes the processes and procedures an p p organization puts in place to ensure that essential functions can continue during and after a disaster

 Disaster recovery is the process, policies and procedures related to y p , p p

preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster

 High availability is a system design protocol and associated implementation g y y g p p

that ensures a certain absolute degree of operational continuity during a

given measurement period

Continuity of Operations Policy (COOP) Continuity of Operations Policy (COOP)

It i th li f th U it d St t t h i l

It is the policy of the United States to have in place a

comprehensive and effective program to ensure continuity of essential Federal functions under all circumstances.

As a baseline of preparedness for the full range of potential emergencies, all Federal agencies shall have in place a viable COOP capability which ensures the performance of their

essential functions during any emergency or situation that may

Today's Reality Today s Reality

IT Business Continuance and Disaster Recovery Considerations

Technology Choices

EMC RecoverPoint

Questions?

Business continuance / COOP describes the processes and procedures an p p organization puts in place to ensure that essential functions can continue during and after a disaster

Disaster recovery is the process, policies and procedures related to y p , p p

High availability is a system design protocol and associated implementation g y y g p p

Today's Reality Today s Reality

IT Business Continuance and Disaster Recovery Considerations

Technology Choices

EMC RecoverPoint

Questions?

Management buy in and commitment is critical g y

Know the regulations specific to your agency or organization

Conduct a risk assessment and identify critical priorities

Determine response for different disaster scenarios

Establish clearly defined roles & responsibilities for personnel

Establish effective communication channels

Maintain necessary resources, tools, and supplies

Testing! Testing! and more Testing! Testing! Testing! and more Testing!

Disaster recovery must be included as part of every process

Disaster recovery must become part of the IT mind set not an after y p thought

High availability and disaster recovery go hand in hand

Define Architectures that build disaster recovery in from the beginning

Define Architectures that build disaster recovery in from the beginning

Application Development

Infrastructure Design

QA QC Test and Development

QA, QC, Test and Development

Make use of industry recognized processes and architectures

ITIL, MOF, MSA / WSSRA, etc…

Recovery of applications without user interruption is nirvana but

Recovery Point Objective (RPO) – The last saved data that the y j ( ) restarted application will reflect following the recovery. Also, a measure of the amount of time for which work may be lost in the event of an unplanned outage at the primary site.

Period tape backup vs. continuous disk-to-disk replication

Synchronous vs. Asynchronous

Recovery Time Objective (RTO) - The time that will pass before an infrastructure is available In order to reduce RTO data must be infrastructure is available. In order to reduce RTO, data must be online and available at another site.

Distance – Data must be recovered on undamaged hardware outside

Today's Reality Today s Reality

IT Business Continuance and Disaster Recovery Considerations

Technology Choices

EMC RecoverPoint

Questions?

RTO Cold Site ^RTO=Days

Data Center design should be a high priority to ensure all the aspects of Data Center design should be a high priority to ensure all the aspects of power, cooling, access and security have been core to the design

The distance between data centers will change the options that you have for the deployment of a disaster recovery strategy for all the services IT for the deployment of a disaster recovery strategy for all the services IT provides

Cold Site, Hot Site, Bunkers, Fully Active / Active