CERN IT Department CH-1211 Geneva 23 Switzerland
www.cern.ch/it
Backup and restore of Oracle
databases: introducing a disk layer
by
Ruben Gaspar
IT-DB-DBB
CERN IT Department CH-1211 Geneva 23 Switzerland
www.cern.ch/it
Agenda
•
CERN Oracle databases & Oracle backup
basics
•
Backup to disk implementation details
•
Recovery platform
•
Some bits of backup to disk backend
•
Summary
CERN IT Department CH-1211 Geneva 23 Switzerland
www.cern.ch/it
Agenda
•
CERN Oracle databases & Oracle backup
basics
•
Backup to disk implementation details
•
Recovery platform
•
Some bits of backup to disk backend
•
Summary
4
Target Oracle databases for backup to disk
•
~70
Oracle databases, most of them running
Oracle clusterware (RAC)
– 49 are being backed up to disk and then tape
– 21 are just backed up with snapshots. Test and development instances.
•
15 Data Guard
RAC clusters in Prod
– Active Data Guard since upgrade to 11g
– They are just backed up to tape
•
10
Oracle single instance in DBaaS also backed up using snapshots.
Redo Transport
Oracle backup basics
•
The Oracle clock: System Change Number (
SCN
)
–
It will take 544 years to run out of SCN at 16K/s
–
smon_scn_time
tracks time versus SCN
•
Type of backups
–
Consistent: taken while
database has been cleanly shutdown
. All redo
applied to data files. Archive logs are not produced.
–
Inconsistent
: taken while
database is running.
Database must be in
archivelog
mode. It means archive logs will be produced. Point in
Time Recoveries (PITR) are possible.
Drawback
: clean-up of
archivelogs is critical to avoid that database blocks → TSM was playing
a critical role here
•
Backup sets
: Oracle proprietary format for backups. Binary files.
–
Backup sets are containers for one or several backup pieces
–
Backup pieces contain blocks of 1 or several data files (multiplexing)
•
RMAN channels
: disk or tape or proxy, read data files and write back to the
backup media. We use
SBT
: serial backup to tape API, using
IBM Tivoli
Data Protection 6.3
(provided by TSM support)
Oracle backup basics (II)
•
Backup jobs based on templates. Recovery Manager API
--Fullbackup incremental level 0 database;
--comulative
backup incremental level 2 cumulative database;
--Incremental
backup incremental level 1 database;
--Archivelogs
backup tag 'BR_TAG' archivelog all delete all input;
•
Retention policy from 60 to 90 days, depending on DB.
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 90 DAYS; e.g. LEMONRAC → [1xfull + 6xdifferential + archivelogs] * 13 weeks
•
Controlfile backup, automatically taken by each backup
CONFIGURE CONTROLFILE AUTOBACKUP ON;
e.g. LHCBSTG → [2xfull + 5xdifferential + 24x4 archivelogs] *13 weeks = 934GB
BR evolution: Backup to disk- 6
2 1
Fulls (GB) Inc (GB) Archived logs Total
LEMONRAC 87902.42 857.52 13319.39 102079.32 PITR Full Cum. Inc
What is there to be backed up ?
Backup jobs using RMAN API take care of : • Database files: user and system files
• Control files: contain structure and status of data files. They have also all backup history
• Archived logs: backup of redo logs. Needed for inconsistent backup strategies. They need to be backed up and removed from the active file system otherwise if running out of space, database freezes/stops.
5.1TB redo logs produced per day
Agenda
•
CERN Oracle databases & Oracle backup basics
•
Backup to disk implementation details
•
Recovery platform
•
Some bits of backup to disk backend
•
Summary
Backup architecture
• Custom solution: about 15k lines of code, Perl + Bash
• Flexible: easy to adapt to new Oracle release, backup media • Based on Oracle Recovery Manager (RMAN) templates
• Central logging
Backup architecture
BR evolution: Backup to disk- 11 • Custom solution: about 15k lines of code, Perl + Bash
• Flexible: easy to adapt to new Oracle release, backup media • Based on Oracle Recovery Manager (RMAN) templates
• Central logging
• Easy to extend via Perl plug-ins: snapshot, exports, RO tablespaces,…
We send compressed: • 1 out of 4 full backups • All archivelogs
Impact on TSM
•
Savings depend on database workload,
e.g.: backup sets on disk for three databases
DB Full (GB) Inc (GB) Archived logs (GB) Savings
EDHP 29197.76 1216.697 2169.766 70%
CASTORNS 4944.839 213.256 336.2889 71%
ATLASSTG 1484.146 724.9567 3063.658 45%
x 1/4 +
Source: TSM support
• + backup sets are compressed (see later)
Sent to tape5 1
7
Impact on TSM (II)
BR evolution: Backup to disk- 13
Source: TSM support
~47% savings ~70% savings
15 accounts: alicestg,atlasstg,cmsstg,castorns,..
Workflow for disk/tape backups
• Same workflow as per tape backups → to ease maintenance
• Disk or Tape templates are almost identical, just channel allocation differs
• Disk channel allocation calculated on the fly considering available space in aggregate and file system: using Netapp management API called ZAPI
• About 75 templates to adapt to all type of backup strategies
• Tape and disk backup strategies co-exist
• Reversible changing from one to another is a matter of changing templates.
Workflow for disk/tape backups
BR evolution: Backup to disk- 15
• Same workflow as per tape backups → to ease maintenance
• Disk or Tape templates are almost identical, just channel allocation differs
• Disk channel allocation calculated on the fly considering available space in aggregate and file system: using Netapp management API called ZAPI
• About 75 templates to adapt to all type of backup strategies
• Tape and disk backup strategies co-exist
• Reversible changing from one to another is a matter of changing templates.
16
Typical DB architecture
RAC
01 02 03 04 Public interface Interconnect LAN 10GbE 10GbE 10GbE 6Gb/s 6Gb/s
backup01 backup02
Media Manager
Server IBM TSM
10GbE
At least 2 file systems for backup to disk:
• /backup/dbsXX/DBNAME Public interface 10GbE 1 GbE 1 GbE 10GbE Cluster interconnect mgmt network Private network
BR evolution: Backup to disk- 16
7-mode
C-mode
datafiles Archivelogs
New C-mode features
BR evolution: Backup to disk- 17
•
Transparent file system movements:
cluster01::> volume move start -destination-aggregate aggr1_c01n02 -vserver vs1 -volume castorns03 -cutover-window 10
•
DNS load balancing inside the cluster
•
Automatic virtual IP rebalancing (based on failover groups)
•
Access security via “export-policy” joins firewall + different
authentication mechanisms: sys, krb5, ntlm
•
Global namespace
•
Compression and Deduplication
– We strongly rely on compression as the way to satisfy 2.3PB of backup set storage needs using 1.1PB of disk
Backup to disk configuration on database servers
• RMAN configuration parameters: minimal change
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/backup/dbs01/<DBNAME>/<DBNAME>_%F';
• Global namespace in use: /backup/dbsXX
• Ease management: mount point unchanged as data moves. It’s a Netapp C-mode feature (see later)
7-mode: mount –o … priv-controllerIP:/vol/castorns03 /ORA/dbs03/CASTOR
C-mode: mount -o … public-ip-cluster:/backup/dbs01/CASTORNS /backup/dbs01/CASTORNS
/backup/dbs01/<DBNAME> → autobackup controlfile + backupsets /backup/dbsXX/<DBNAME> → backupsets
19
Particular cases
•
Solution also operational in a
Data Guard
configuration: full and
incremental taken on standby (more while talking about restores)
•
Multiple channels:
rman_channels_connect
in order to
distribute backup load
•
Plug-in for RO tablespaces backup (ACCLOG:
size about 170TB, growth 70TB/year)
–
Automatic clean-up in case of tablespace state change
–
One backup set per tablespace
•
Extension to allow special mount points (ACCLOG)
–
rman_mounts_readonly
Active Data Guard for users’ access and
for disaster recovery Primary Database Redo Transport full + incremental + controlfile archivelogs + controlfile
BR evolution: Backup to disk- 19
username/password@rac-node1 username/password@rac-node2
20
Backup to disk performance
34 hours ~ 35 MB/s
14 hours ~ 100MB/s
Tape Disk
ACCLOG full backup 5TB
• Backups run faster ~ 50% than on tape
• Sending backup sets from disk to tape needs optimisation • Work on progress with TSM support
21
Backup to Disk space consumption
•
Channels order is important → storage management
–
Space distribution should be according planning to avoid
miss balance. File systems should grow at same pace.
–
Emptiest volume is always selected on top
Automatic size extension
Agenda
•
CERN Oracle databases & Oracle backup basics
•
Backup to disk implementation details
•
Recovery platform
•
Some bits of backup to disk backend
23
Recovery platform
•
Only reliable proof of truth: run a recovery
•
Any change introduce in backup platform/backup strategy is
always validated via test recoveries
•
Isolation
–
Run independently of the production database
–
Cant access any other system (database network links)
–
No user jobs must run
•
Flexible and easy to customize
•
Maximize recovery server: several recoveries at the same time
–
Exports taken after a successful recovery → help in support
cases: mainly logical errors
•
Open source:
http://sourceforge.net/projects/recoveryplat/
24
Recovery platform (II)
•
Introducing disk buffer highly improves our recovery
testing
•
Also tested with Data Guard configurations:
– Data Guard: Oracle support ID 1070039.1
RMAN> set backup files for device type disk to accessible
•
Restore from disk are usually 50% faster
• More recoveries can be run, nowadays about 40
recoveries per week • No blocking of tape
resources that could be used by backups
Agenda
•
CERN Oracle databases & Oracle backup basics
•
Backup to disk implementation details
•
Recovery platform
•
Some bits of backup to disk backend
•
Summary
Backup to disk cluster
• 2xFAS6240 Netapp controllers • 24xdiskshelf DS4243
• 24x3TB SATA disks each (576 disks)
• raid_dp (raid6) → 1.1 PB usable space split into 8 aggregates ~ 135TB each • 2xquad core 64bit Intel(R) Xeon(R) CPU E5540 @ 2.53GHz
• 10gbps connectivity
• Multipath SAS loops 3 gbps • Flash cache 512GB per node
How fast, How compressed
BR evolution: Backup to disk- 27
•
Compression (datafiles)
–
Online compression of datafiles ~55% (saved by compression)
•
Backupsets compression of a 501 GB tablespace of random alphanumeric
strings, dbms_random.
no-compressed (t) basic low medium high
No-compressed-fs Cron- compression Netapp 8.1.1 Inline-compression Netapp 8.1.1
501GB 83GB (6h21’) 116GB (49’) 88GB (07h23’) 82GB (11h02’) 459GB(41’) 188GB 188GB(46’)
Percentage saved (%)
83% 76,8% 82,4% 83,6% 8,3% 62% 62%
0 50 100 150 200 250 300 350 400 450
1 2 3
MB/s
Number of channels
RMAN backup to disk*
knfs dnfs
dnfs + Ontap compression
28
Compression: real values
Used(GB)* Saved (GB)
%saved-by-compression
AISDB_PROD 24719 25941 52
CASTORNS 3629 3448 49
CMSSTG 6510 6395 50
CSR 20636 32008 61
ITCORE 16387 23552 60
EDHP 9631 24913 66
LEMONRAC 47104 49152 51
*Space used on controller side
Logical space used: Used + Saved
29
NAS controllers throughput
net_data_recv
disk_data_written
compression ratio
30
Deduplication
•
When combined with compression, it doesn’t provide
good results
–
Due to the way compression works: compression group: 32k,
our Oracle block is 8k, Wafl block is 4k
Checksum 4k
4k
31
Deduplication
•
When combined with compression, it doesn’t provide
good results
–
Due to the way compression works: compression group: 32k,
our Oracle block is 8k, Wafl block is 4k
Checksum 4k
4k
• Control files are a different story. Block size of 16k
DB Type Location Size(GB)
PAYP archives /backup/dbs01 0.91 PAYP archives /backup/dbs02 22.90
PAYP controlfile /backup/dbs01 456.92
PAYP fullinc /backup/dbs01 68.00 PAYP fullinc /backup/dbs02 81.10