• No results found

IT Disaster Recovery Policy

N/A
N/A
Protected

Academic year: 2021

Share "IT Disaster Recovery Policy"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

IT Disaster Recovery Policy

Use in the event of Individual system outages or total loss of Infrastructure.

Total Loss - resulting from Natural Disasters, Fire, intentional damage or Equipment/Hardware failure.

The college is housed over 4 sites; in the event of a disaster the 2 main sites Vernon Street & Blenheim Walk can accommodate additional IT services.

The main bulk of the IT infrastructure is housed in the Blenheim Walk Server room.

If the Blenheim Walk site was lost then we would still have some functionality based at VS to setup a base Infrastructure.

Blenheim court is linked via fibre to BW; if BW was lost then BC would also be non functional and therefore should not be considered for a DR base.

Brodrick Building is a rented building connected via a fibre optic laser, this building should not be considered for a DR base.

Areas of cover, Supplier info

Network Infrastructure Staff Applications/Hardware Student Applications/Hardware Server & Network Topology Backup Procedure

(2)

Insight – Technology Building, Insight Campus, Terry Street, Sheffield. S9 2BU. Tel | 0870 706 7265

Fax | 0870 706 8265

shaun.brooks@uk.insight.com

Misco – Darby Close, Park Farm Industrial Estate, Wellingborough, Northants, NN8 6GS Tel: 0870 720 8720, Fax: 0870 720 8686, Email: salesdesk@misco.co.uk

Mhill@misco.co.uk

Intrinsic Technology – Cisco sales – 01942 528100 Network Infrastructure

The base/backbone of the network is controlled by a centralised Firewall/router. This is provided by ECSC; currently we have this using 2 Dell PowerEdge Servers. ECSC Ltd - 01274 513266

ECSC holds backups of the configuration along with a copy of Leeds-art.ac.uk DNS and Academic.lcad DNS zones.

ECSC will need to arrange on-site consultancy to install and configure the new firewalls. A master password list is currently contained in the server room Safe at Blenheim Walk.

Switch requirements

The college uses CISCO switches which run 7 VLANs, 2 staff side, 2 student side, 2 wireless & management.

Switches needed – CISCO Catalyst 3750 series. Internet Connectivity

The Internet/WAN connection is provided by Janet UK – www.ja.net - 0870 850 2212

In the event of WAN loss Janet will need to install a new router and fibre link from the college to the Janet network.

If necessary the new locations WAN link (internet) can be used to temporarily route email etc.

Communications

The Telephone system is an IP system using Alcatel switches, in the event of a disaster the main college line can be diverted to a temporary BT line or equivalent.

Freedom communications can provide support and provide the hardware/configuration to rebuild the phone system.

Freedom Communications (UK) Ltd

(3)

Telephone: 01923 654321, Facsimile: 01924 233070, Service: 0870 055 6900 www.freedomcomms.com

Server Racking & Air Conditioning

There will need to be enough rack space and cooling to accommodate at least 35 servers. If only critical systems are in use then facilities for 10 servers will be sufficient.

Backup system

A server and tape device will be required in order to restore the data.

The tape drive requirements are – HP Storageworks Ultrium 920 400/800 LTO SAS. VERITAS Backup Exec Software will also be required to perform the restores. Current server name – Oscar – HP DL380 G5 with 2 Sata drive enclosures.

Domain Infrastructure Leeds-art.ac.uk – staff domain Academic.lcad – student domain

Staff Applications Critical:

Unit 4 Agresso system.

This runs on 7 servers, 2 physical & 5 virtual.

Resource 3200 – finance system, use the IT Wiki for restoration instructions. Corero, 01923 695137

Server name – BEAKER Windows 2008 HP DL380 G6

Exchange 2010SP2 – Email system, restore server, system state and exchange mail store. Server name – Chef, Windows 2008 HP DL380 G8

Chris21 – Human Resources System, stores all employee data, use same procedure as above.

Frontier Software – 01925 852284, 01925 852242 Server name – Ernie Windows 2003 HP DL360 G4

(4)

Server name – GONZO Windows 2003 HP DL380 G5 Non Critical

Liberty – Library system, restore from tape.

Softlink Europe Ltd Tel - +44 (01993) 883401 Fax - +44 (01993) 883799 Server name – Bunsen HP DL360 G5

Cellcat – Timetabling software – restore sql database. Cellcat.com

Server name – ERNIE Windows 2003 HP DL360 G4

Agent – EBS Reporting software, contact Tribal for installation. Tribal Technology Ltd -Tel: 0114 281 6100 Fax: 0114 2816021 Server name – AGENT Windows 2003 HP DL360 G5

Alactel 4760 – Telephone system software, Restore from tape. Server name – ROWLF Windows 2003 HP DL360 G5

Efficient 5 – Door entry system, database held on tape. Kronos systems ltd - Tel: 01302 381505

Server name – Clifford (virtual) & TMS Camilla Windows 2003 HP DL360 G5. Staff Intranet – Staff internal documents, department info. Restore from tape. Server name – KERMIT Windows 2008 HP DL380 G7

IT application – Asset tracker and print system etc. Server name – ANIMAL (virtual)

IT Network monitoring – In house tools and WSUS Server name – STRANGEPORK

Student Applications

File Server(s) – Holds students work, currently backed up once per week (to disk) but too large to house on tape or off site.

Server name – MENDEZ HP DL585 & GAUDI Windows 2008 HP X1800 G2 PaperCut – student printing system, hold records of students print credit balance.

(5)

Server name – RAPHAEL HP DL360 G6P & KAHLO Windows 2003 HP DL360 G4

Student Web server – allows students to create their own websites.

Server name – STUD_WWW UNIX HP DL360 G5 Student Licensing Servers – Not backed up. Server name - BLAKE Apple Mac Server Server name – KIRBY Apple Mac Server

Server name – HART Windows 2008 server – Autocad/ghosting etc

Backup Process

The current full backup size is 6.0 TB of data. Restore times for this level of data can vary depending on the server/network foundation.

Restores of this size are usually run throughout the night.

Therefore certain systems data will require more time to restore from tape than others, depending on total size.

Tape Retention

One full Friday backup each month end, 12 tapes, possible to restore up to 1 year previous. 16 Incremental Weekday tapes, these are held for 4 weeks.

4 Full Friday tapes – 4 weeks of data.

Current Backup Procedure.

Backup Server – Oscar, DL380 G5 with HP 800/1600 SAS Autoloader.

Backup Regime, currently covers a 4 week rotation (disk and tape)

Backup to disk is performed first and then backup to tape is performed after when the backup window has diminished.

Monday – Thursday – Differential – changed files from full backup, covers 1 tape. (Total back-up around 300 GB)

Friday – Full backup – 3 tapes

Month End Tape – 3 tapes to be used instead of the Friday tape on the last Friday of every month.

(6)

Tape Storage.

The daily and following weeks tapes are housed in the fireproof section of the safe which is located in the IT Server room.

All sets of Friday tapes (except current week) are to be housed at Blenheim Court (off-site).

Tape Procedure.

Every Friday the following weekday tapes are changed and an inventory set to run.

The following day the backup logs should be checked for errors or any problems which may have been encountered, i.e. faulty tapes.

Tapes then should be removed and placed back in the safe.

Tape Labelling.

Tapes are labelled according to the 4 week rotation. Monday 1 - 4

Tuesday 1 - 4 Wednesday 1 - 4 Thursday 1 - 4

This shows how week one is labelled, week 2 is 2, week 3 is 3 and week 4 is 4. Friday tapes are also labelled over 4 weeks same as above. 1.1, 1.2, 1.3, 2.1, 2.2 etc Month End tapes are labelled according to the month i.e. March 1 (2) etc.

Data is held on Tape for a maximum duration of 12 months.

VEEAM

Veeam is used to backup our virtual servers. The virtual infrastructure consists of one ESX4i server and a cluster of 3 ESX5i servers in a VSA. See the IT wiki for detailed setup.

Backup Exec V2I Imaging Software.

This software is used to image an entire server and its drives. If total failure occurs a new server can be imaged from the V2i file and can be up and running in hours. Currently we only have 9 licenses which are used to backup critical servers.

Virtual – Vmware (VEEAM)

With the growing use of VmWare and the introduction of the VSA, we are using Veeam to backup the virtual machines, this takes a snapshot of the servers.

(7)

Restores

Restores are carried out using backup exec, using the software to select the file required and then the system will restore from disk or ask for the corresponding tape.

Restores are sent to a new location, for checking before placing back in their original location.

Tape Drive cleaning

When the cleaning light flashes on the drive the loader should clean the heads. (around 28 days)

The cleaning tape only lasts for a set amount of cleans.

The drive should always have a current cleaning tape not an expired one. This should help keep the backup drive functioning normally.

Issues

Any issues that could stop the running of a backup job should be immediately identified to the Infrastructure team.

Restore Order

 Network Infrastructure

 External Links – Internet & Telecommunications Servers (priority)

1. Backup Exec Server/Tape drive 2. Finance System 3. Agresso System 4. HR System 5. File Server 6. Email Server 7. Other systems

However it is noted that depending on the time of year the HR and Agresso systems could take precedence over the Finance system.

Build Times

Source and purchase new Hardware, 1 week

Rapid Response from ECSC to install new firewall/router – 1 week Liaise with BT for new telecommunications lines, BT dependant

(8)

Arrange for Freedom consultancy to rebuild the phone system at new location, 1-2 weeks. Install Internet Connection, Janet/BT dependant

Install and configure new backup exec server, 2 days Install and configure new switching (basic) – 1 week Liaise with 3rd party software vendors.

Install Servers

Agresso – 1 to 2 weeks Email – 1 week

Finance System – 1 week HR System 1 week

Initial Restore Times:

1 day - Tape Recovery and Catalogue. 1 Week - Data Recovery from Tape. 1 week catalogue software stores.

Roles:

Chris Dodds – Responsible for the configuration and maintenance of backup jobs, technical support for the backup process and fault resolution. In charge of tape rotation, log analysis & ad-hoc data restores.

Ian Atkinson – Responsible for scripting Linux/Ubuntu servers backup jobs. Technical support for application and server restores.

Dan Hill – DR policy and Supplier & 3rd

party relations. Technical support for application and server restores.

Restore Roles:

Chris Dodds – Retrieve appropriate tapes from safe. Help to rebuild pc’s and servers if required.

All - setup of backup exec server, retrieval of data from tapes, restoration of servers in order of business impact policy.

(9)

All – Setup of Network infrastructure and switches, restoration of non windows servers and workstations. Aid in setting up external links, e.g. phones and internet.

Dan Hill – Procurement of necessary IT equipment and services. Liaise with the business on recovery timescales and procedures. Carry out processes of restore according to the DR policy and the business impact analysis.

Disaster Recovery Testing Schedule

All systems are to be restored from tape using the DR tape drive and server using a VMware virtual environment.

The testing phase will then lead to updating the system documentation to either add or amend server configurations on the IT WIKI.

Any problems, new servers or additional info will also be entered into the WIKI.

Once updated the pages from the WIKI are to be printed out and place in the off-site safe.

DR Testing Scenarios 2011

These were completed individually as separate DR scenarios.

Time taken = total time to restore the server to a working state, no time gaps from start to finish (real world)

Build Virtual Environment - Installation of ESX Server from DVD: 20mins

Installation of Backup Exec : 40mins

Cataloging of Media:10mins, BESR: 20mins

Install & configure Windows Server (200X) - Installation of 2008R2 with default settings: 20minsInstallation of 2003R2 with default settings: 30minsConfigurations and installation of services, roles and features (i.e. IIS, .Net, File, Print): 10mins

Install & Restore Exchange - Installation of Exchange with disaster recovery option: 20minsRestoration of Exchange database from tape: 31mins

Bridget requested calendar items from Outlook via helpdesk. Used DR server and data. Was able to login without errors, data was in and no permissions errors were recorded..

Install & Restore Resource

Restoration of files from tape: 40mins

Restoration of SQL database from tape: 20mins

JB tested: All data was in but reports couldnt be run due to problems with IQ Objects.

Staff File Server – restore a selection of files

(10)

Estimated time for full restore of all data (1400Gb): 35Hrs

Tested if permissions had copied over, all were correct and working. Logged in to test PC and folder mapping (S:) were created.

HR – Chris21 Install & Restore

Installation of C21 software: 10mins

Restoration of data files: 5mins

Amanda N tested. Data was correct and could be amended, saved and reloaded.

EBS – restore Databases & files for Tribal to use

Installation of Oracle: 20mins

Restoration of Database:

Installation of Oracle: 20mins

Restoration of Database:

No Oracle licence, Tribal will not restore without it.

Problems Encountered

Backup Exec System Recovery

Some of the images would not import directly to ESX server. However we could deploy onto a base image, and extract date directly.

Network Traffic

Running the network at 100Mbps slowed down the transfer rate and may have exaggerated restore times. This will not be a problem in a real disaster as all new switches run at 1000Mbps.

Server 2008 R2

Very choppy mouse performance within the ESX console however performed fine when using remote desktop. VMware expect a fix within the next driver update package.

Need more tracking to ensure that correct installations is installed 1st time.

Oracle

Could not restore DB from BE, unknown why. Tribal to investigate.

However dumps restored and imported.

Resource

(11)

DR Testing Scenarios 2012

This year we focussed on scenario testing. The following were performed:

 Migration and restore of Resource 32000 to a new OS and server. Fully document the transfer and add to the IT wiki. (mirrors total failure)

 Create and move Portal (intranet) from Linux to Windows 2008. (mirrors total failure)  IT Helpdesk – restore and migrate into a virtual environment. (mirrors total failure)  Test restoration of new Agresso servers into another virtual environment.

All scenarios were successful and new documentation has been added to the IT Wiki.

DR Testing Scenarios 2013

Restoration and migration of exchange email server.  Restore from backup

 Create working replica of exchange 2003

 Create new exchange 2010 server and upgrade domain  Migrate users to new server

References

Related documents

• Recovery points: Sufficient hard disk space on a local hard disk or network server for storing recovery points • Backup Exec System Recovery LightsOut Restore: 500 MB. CD-ROM

NetWorker Backup/Recovery Architecture Conventional (Tape-centric) Conventional (Tape-centric) Transformational (Disk-centric) Transformational (Disk-centric) Backup/Media

Restore from Tape Using Backup Software Copy from Live File Server.. System Level Recovery Steps Rebuild base

Tape Backup Tape Array Hot Site Cold Site Tape Management Tape + Journaling/Logging Disk Mirroring Raid 5 Clustering Replication Plus Failover Managed Journal/ Log

Restoring a Windows 2000 Domain Controller Using Restore Anyware To restore a backup of a Windows 2000 Active Directory Domain Controller created with Backup Exec System Recovery

There are several backup and restore options in Office SharePoint Server 2007, including Web-based Central Administration backup and restore, command-line backup, and the

” “ 1 4 2 Drive Failure Media 3 5 Partition Backup Media SEP Backup Server Start Dissimilar Hardware Wizard Restore Backup to Boot-Ready File Select Data Recovery Saveset

Tape Sticks in Drive During Backups and/or Restores in Backup Exec Cause: When performing backups and/or restores in Backup Exec, a degaussed cartridge loaded to the drive may