• No results found

Backup and restore of Oracle databases: introducing a disk layer

N/A
N/A
Protected

Academic year: 2021

Share "Backup and restore of Oracle databases: introducing a disk layer"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

CERN IT Department CH-1211 Geneva 23 Switzerland

www.cern.ch/it

Backup and restore of Oracle

databases: introducing a disk layer

by

Ruben Gaspar

IT-DB-DBB

(2)

CERN IT Department CH-1211 Geneva 23 Switzerland

www.cern.ch/it

Agenda

CERN Oracle databases & Oracle backup

basics

Backup to disk implementation details

Recovery platform

Some bits of backup to disk backend

Summary

(3)

CERN IT Department CH-1211 Geneva 23 Switzerland

www.cern.ch/it

Agenda

CERN Oracle databases & Oracle backup

basics

Backup to disk implementation details

Recovery platform

Some bits of backup to disk backend

Summary

(4)

4

Target Oracle databases for backup to disk

~70

Oracle databases, most of them running

Oracle clusterware (RAC)

– 49 are being backed up to disk and then tape

– 21 are just backed up with snapshots. Test and development instances.

15 Data Guard

RAC clusters in Prod

– Active Data Guard since upgrade to 11g

– They are just backed up to tape

10

Oracle single instance in DBaaS also backed up using snapshots.

Redo Transport

(5)

Oracle backup basics

The Oracle clock: System Change Number (

SCN

)

It will take 544 years to run out of SCN at 16K/s

smon_scn_time

tracks time versus SCN

Type of backups

Consistent: taken while

database has been cleanly shutdown

. All redo

applied to data files. Archive logs are not produced.

Inconsistent

: taken while

database is running.

Database must be in

archivelog

mode. It means archive logs will be produced. Point in

Time Recoveries (PITR) are possible.

Drawback

: clean-up of

archivelogs is critical to avoid that database blocks → TSM was playing

a critical role here

Backup sets

: Oracle proprietary format for backups. Binary files.

Backup sets are containers for one or several backup pieces

Backup pieces contain blocks of 1 or several data files (multiplexing)

RMAN channels

: disk or tape or proxy, read data files and write back to the

backup media. We use

SBT

: serial backup to tape API, using

IBM Tivoli

Data Protection 6.3

(provided by TSM support)

(6)

Oracle backup basics (II)

Backup jobs based on templates. Recovery Manager API

--Full

backup incremental level 0 database;

--comulative

backup incremental level 2 cumulative database;

--Incremental

backup incremental level 1 database;

--Archivelogs

backup tag 'BR_TAG' archivelog all delete all input;

Retention policy from 60 to 90 days, depending on DB.

CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 90 DAYS; e.g. LEMONRAC → [1xfull + 6xdifferential + archivelogs] * 13 weeks

Controlfile backup, automatically taken by each backup

CONFIGURE CONTROLFILE AUTOBACKUP ON;

e.g. LHCBSTG → [2xfull + 5xdifferential + 24x4 archivelogs] *13 weeks = 934GB

BR evolution: Backup to disk- 6

2 1

Fulls (GB) Inc (GB) Archived logs Total

LEMONRAC 87902.42 857.52 13319.39 102079.32 PITR Full Cum. Inc

(7)
(8)

What is there to be backed up ?

Backup jobs using RMAN API take care of : • Database files: user and system files

• Control files: contain structure and status of data files. They have also all backup history

• Archived logs: backup of redo logs. Needed for inconsistent backup strategies. They need to be backed up and removed from the active file system otherwise if running out of space, database freezes/stops.

5.1TB redo logs produced per day

(9)

Agenda

CERN Oracle databases & Oracle backup basics

Backup to disk implementation details

Recovery platform

Some bits of backup to disk backend

Summary

(10)

Backup architecture

• Custom solution: about 15k lines of code, Perl + Bash

• Flexible: easy to adapt to new Oracle release, backup media • Based on Oracle Recovery Manager (RMAN) templates

• Central logging

(11)

Backup architecture

BR evolution: Backup to disk- 11 • Custom solution: about 15k lines of code, Perl + Bash

• Flexible: easy to adapt to new Oracle release, backup media • Based on Oracle Recovery Manager (RMAN) templates

• Central logging

• Easy to extend via Perl plug-ins: snapshot, exports, RO tablespaces,…

We send compressed: • 1 out of 4 full backups • All archivelogs

(12)

Impact on TSM

Savings depend on database workload,

e.g.: backup sets on disk for three databases

DB Full (GB) Inc (GB) Archived logs (GB) Savings

EDHP 29197.76 1216.697 2169.766 70%

CASTORNS 4944.839 213.256 336.2889 71%

ATLASSTG 1484.146 724.9567 3063.658 45%

x 1/4 +

Source: TSM support

• + backup sets are compressed (see later)

Sent to tape

5 1

7

(13)

Impact on TSM (II)

BR evolution: Backup to disk- 13

Source: TSM support

~47% savings ~70% savings

15 accounts: alicestg,atlasstg,cmsstg,castorns,..

(14)

Workflow for disk/tape backups

• Same workflow as per tape backups → to ease maintenance

• Disk or Tape templates are almost identical, just channel allocation differs

• Disk channel allocation calculated on the fly considering available space in aggregate and file system: using Netapp management API called ZAPI

• About 75 templates to adapt to all type of backup strategies

• Tape and disk backup strategies co-exist

• Reversible changing from one to another is a matter of changing templates.

(15)

Workflow for disk/tape backups

BR evolution: Backup to disk- 15

• Same workflow as per tape backups → to ease maintenance

• Disk or Tape templates are almost identical, just channel allocation differs

• Disk channel allocation calculated on the fly considering available space in aggregate and file system: using Netapp management API called ZAPI

• About 75 templates to adapt to all type of backup strategies

• Tape and disk backup strategies co-exist

• Reversible changing from one to another is a matter of changing templates.

(16)

16

Typical DB architecture

RAC

01 02 03 04 Public interface Interconnect LAN 10GbE 10GbE 10GbE 6Gb/s 6Gb/s

backup01 backup02

Media Manager

Server IBM TSM

10GbE

At least 2 file systems for backup to disk:

/backup/dbsXX/DBNAME Public interface 10GbE 1 GbE 1 GbE 10GbE Cluster interconnect mgmt network Private network

BR evolution: Backup to disk- 16

7-mode

C-mode

datafiles Archivelogs

(17)

New C-mode features

BR evolution: Backup to disk- 17

Transparent file system movements:

cluster01::> volume move start -destination-aggregate aggr1_c01n02 -vserver vs1 -volume castorns03 -cutover-window 10

DNS load balancing inside the cluster

Automatic virtual IP rebalancing (based on failover groups)

Access security via “export-policy” joins firewall + different

authentication mechanisms: sys, krb5, ntlm

Global namespace

Compression and Deduplication

– We strongly rely on compression as the way to satisfy 2.3PB of backup set storage needs using 1.1PB of disk

(18)

Backup to disk configuration on database servers

• RMAN configuration parameters: minimal change

CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/backup/dbs01/<DBNAME>/<DBNAME>_%F';

• Global namespace in use: /backup/dbsXX

• Ease management: mount point unchanged as data moves. It’s a Netapp C-mode feature (see later)

7-mode: mount –o … priv-controllerIP:/vol/castorns03 /ORA/dbs03/CASTOR

C-mode: mount -o … public-ip-cluster:/backup/dbs01/CASTORNS /backup/dbs01/CASTORNS

/backup/dbs01/<DBNAME> → autobackup controlfile + backupsets /backup/dbsXX/<DBNAME> → backupsets

(19)

19

Particular cases

Solution also operational in a

Data Guard

configuration: full and

incremental taken on standby (more while talking about restores)

Multiple channels:

rman_channels_connect

in order to

distribute backup load

Plug-in for RO tablespaces backup (ACCLOG:

size about 170TB, growth 70TB/year

)

Automatic clean-up in case of tablespace state change

One backup set per tablespace

Extension to allow special mount points (ACCLOG)

rman_mounts_readonly

Active Data Guard for users’ access and

for disaster recovery Primary Database Redo Transport full + incremental + controlfile archivelogs + controlfile

BR evolution: Backup to disk- 19

username/password@rac-node1 username/password@rac-node2

(20)

20

Backup to disk performance

34 hours ~ 35 MB/s

14 hours ~ 100MB/s

Tape Disk

ACCLOG full backup 5TB

• Backups run faster ~ 50% than on tape

• Sending backup sets from disk to tape needs optimisation • Work on progress with TSM support

(21)

21

Backup to Disk space consumption

Channels order is important → storage management

Space distribution should be according planning to avoid

miss balance. File systems should grow at same pace.

Emptiest volume is always selected on top

Automatic size extension

(22)

Agenda

CERN Oracle databases & Oracle backup basics

Backup to disk implementation details

Recovery platform

Some bits of backup to disk backend

(23)

23

Recovery platform

Only reliable proof of truth: run a recovery

Any change introduce in backup platform/backup strategy is

always validated via test recoveries

Isolation

Run independently of the production database

Cant access any other system (database network links)

No user jobs must run

Flexible and easy to customize

Maximize recovery server: several recoveries at the same time

Exports taken after a successful recovery → help in support

cases: mainly logical errors

Open source:

http://sourceforge.net/projects/recoveryplat/

(24)

24

Recovery platform (II)

Introducing disk buffer highly improves our recovery

testing

Also tested with Data Guard configurations:

– Data Guard: Oracle support ID 1070039.1

RMAN> set backup files for device type disk to accessible

Restore from disk are usually 50% faster

• More recoveries can be run, nowadays about 40

recoveries per week • No blocking of tape

resources that could be used by backups

(25)

Agenda

CERN Oracle databases & Oracle backup basics

Backup to disk implementation details

Recovery platform

Some bits of backup to disk backend

Summary

(26)

Backup to disk cluster

• 2xFAS6240 Netapp controllers • 24xdiskshelf DS4243

• 24x3TB SATA disks each (576 disks)

• raid_dp (raid6) → 1.1 PB usable space split into 8 aggregates ~ 135TB each • 2xquad core 64bit Intel(R) Xeon(R) CPU E5540 @ 2.53GHz

• 10gbps connectivity

• Multipath SAS loops 3 gbps • Flash cache 512GB per node

(27)

How fast, How compressed

BR evolution: Backup to disk- 27

Compression (datafiles)

Online compression of datafiles ~55% (saved by compression)

Backupsets compression of a 501 GB tablespace of random alphanumeric

strings, dbms_random.

no-compressed (t) basic low medium high

No-compressed-fs Cron- compression Netapp 8.1.1 Inline-compression Netapp 8.1.1

501GB 83GB (6h21’) 116GB (49’) 88GB (07h23’) 82GB (11h02’) 459GB(41’) 188GB 188GB(46’)

Percentage saved (%)

83% 76,8% 82,4% 83,6% 8,3% 62% 62%

0 50 100 150 200 250 300 350 400 450

1 2 3

MB/s

Number of channels

RMAN backup to disk*

knfs dnfs

dnfs + Ontap compression

(28)

28

Compression: real values

Used(GB)* Saved (GB)

%saved-by-compression

AISDB_PROD 24719 25941 52

CASTORNS 3629 3448 49

CMSSTG 6510 6395 50

CSR 20636 32008 61

ITCORE 16387 23552 60

EDHP 9631 24913 66

LEMONRAC 47104 49152 51

*Space used on controller side

Logical space used: Used + Saved

(29)

29

NAS controllers throughput

net_data_recv

disk_data_written

compression ratio

(30)

30

Deduplication

When combined with compression, it doesn’t provide

good results

Due to the way compression works: compression group: 32k,

our Oracle block is 8k, Wafl block is 4k

Checksum 4k

4k

(31)

31

Deduplication

When combined with compression, it doesn’t provide

good results

Due to the way compression works: compression group: 32k,

our Oracle block is 8k, Wafl block is 4k

Checksum 4k

4k

• Control files are a different story. Block size of 16k

DB Type Location Size(GB)

PAYP archives /backup/dbs01 0.91 PAYP archives /backup/dbs02 22.90

PAYP controlfile /backup/dbs01 456.92

PAYP fullinc /backup/dbs01 68.00 PAYP fullinc /backup/dbs02 81.10

(32)

Agenda

CERN Oracle databases & Oracle backup basics

Backup to disk implementation details

Recovery platform

Some bits of backup to disk backend

(33)

Summary

Backup and Recovery testing is critical

Tape copies are essential but TSM became a critical point of

failure for DB services

Adding a disk buffer

Removes TSM criticality

Reduces DB volume in TSM

Speeds up backups and restores

• Better response time

• Better resource utilization

Disk buffer plug-ins were easily integrated in our backup

framework

First system to exploit Ontap C-mode features

Valuable experience for the future

(34)

Questions

References

Related documents

Database data files—These should be backed up during cold backup as well as during online backup, using Oracle’s Recovery Manager (RMAN) or, in Oracle Database versions in which

3 Backing Up to Oracle Database Backup Cloud Service After you install the Oracle Database Cloud Backup Module and configure Recovery Manager (RMAN) settings, you can perform backup

Similarly, DBAs depend on Oracle Recovery Manager (RMAN) to quickly and efficiently backup control files, data files, archive logs to disk and tape while speedily recovering

performance of the Avamar Grid. Conversely, if the hash caches are oversized, then the hash caches, which are loaded into the client's memory at the beginning of the backup,

RMAN and Oracle Secure Backup Jobs Managing Database Tape Backups Performing Database Recovery. RMAN Automatic Failover to Previous Backup Using

This document will cover the details required to restore an Oracle database backed up with either Oracle’s Enterprise Backup Utility (e.g. EBU) or Recovery Manager (e.g. RMAN) to

New in Oracle Database 10g Release 1, Flash Recovery Area allows administrators to setup notifications on disk space usage and automate obsolescence of expired backup sets, via

The Data Protector Oracle Integration agent (ob2rman.pl) executes RMAN, which directs the Oracle server processes on the target database to perform backup, restore and recovery..