SEIZE THE DATA SEIZE THE DATA. 2015

(1)

(2)

BIG DATA CONFERENCE 2015

Boston August 10-13

(3)

Module Overview

 Backup and Restore

 Copy Vertica Database

 Online Recovery

(4)

(5)

Backup - Overview

Backup is the process of copying the actual data files to a specified location

• Vertica data and backup files are written once

− Once a file is written Vertica will not update it

• Number of files increase with each backup

• Tuple Mover keeps the number of files under control

− The TM ‘mergeout’ process consolidates smaller ROS containers into larger ones

• To backup, copy Vertica files to stable storage

− Can be direct attached storage, NFS mounts or SAN

(6)

Backup – When?

Backup is the process of copying the actual data files to a specified location

• Part of Regular Disaster Recovery Strategy

− Nightly, weekly, depending on business continuity requirements and resources

• After loading or altering a large volume of data

• Before Maintenance Tasks

− Upgrading to another version of Vertica − Dropping a Partition

(7)

Backup and Restore – Options

There are several ways to take a Vertica Backup

• Backup and Restore by Database

− Most common backup process

− Backs up the entire database which includes all the schemas and objects within them

• Backup and Restore by Schema

− Multi-tenant database with different backup frequency

− Multi-application cluster with different backup requirements /policies

• Backup and Restore by Table

− Can be used to backup some critical tables − Restore certain tables for QA / Testing

(8)

Vertica Backup Restore – VBR

vbr.py is a Python script located under /opt/vertica/bin

• Use vbr.py with various options to take backup and restore data

• Create a configuration file

− vbr.py --setupconfig

− Goes into interactive mode, gathers all parameters and creates the configuration file

• VBR parameters

− Database name, schema name, snapshot name, object names

(9)

(10)

(11)

Vertica Backup Restore – VBR

A few parameters explained

• Snapshot Name – stores all the files under that named directory

• Restore Points – number of

incremental

backups stored in addition to full backup

• Node

− Names of nodes in the cluster

− Data is backed up from each node of the cluster

• Backup Directory

− Location where the backup files are stored

(12)

VBR preparation

Steps and some prerequisites

• Backup location to be configured on all the nodes

• Verify database is running

• Ensure backup hosts are running if data is backed up to those hosts

− Backup can be done to the same cluster nodes

− Backup can also be done to a dedicated host which has the SAN storage

• Backup Directory Permissions / Contents

(13)

Performing a Backup

How to run the vbr.py script

• vbr.py --task backup --config-file <myconfigfile>

− Same command is used for full and incremental backups

• First run does a

full

backup

− All data files are copied to the sub-directory with the snapshot name

• Subsequent runs are

incremental

− Copies files which have changed since last backup − Files are only added or deleted, never modified

− Each incremental backup goes into a separate sub-directory with a timestamp − Each incremental backup also adds those files to the full backup

(14)

(15)

Performing a Restore

The same vbr.py script is used for restore

• vbr.py --task restore --config-file <myconfigfile>

− The configuration file is the same that is used for the Backup

• Restore can be specific

− Entire database, specific schema or table depending on the configuration file used − Vertica copies the files from backup location to the data directory location

• Some key features

− Vertica does not have the concept of transaction logging − There is no roll forward or roll back of transactions

(16)

(17)

Copy Vertica Database

This option of VBR copies the entire Database (cluster) to a target cluster

• When do we need

copycluster

?

− Maintain a warm-standby cluster for Disaster Recovery

− Provide an alternative cluster to a different set of users / applications

• Prerequisites

− Source and Target cluster must have same number of nodes

− Database, node names and dbadmin user have to be the same on both sides − Password-less ssh has to be established between all the nodes on both sides − Target database has to be shut down before starting the process

• vbr.py --task copycluster --config-file <cfgfile>

(18)

(19)

Node Recovery

Vertica is highly available MPP architecture, but nodes may go down…

• Node can recover from failure

− A node can rebuild its data set from other nodes in the cluster if the cluster is K-safe − In a full recovery the node rebuilds from scratch

• Incremental Recovery

− Node rebuilds from the current persisted state

− To speed up a full recovery, use a prior backup for the given node and perform incremental recovery

SEIZE THE DATA SEIZE THE DATA. 2015

Module Overview

 Backup and Restore

 Copy Vertica Database

 Online Recovery

Backup - Overview

Backup is the process of copying the actual data files to a specified location

• Vertica data and backup files are written once

• Number of files increase with each backup

• Tuple Mover keeps the number of files under control

• To backup, copy Vertica files to stable storage

Backup – When?

Backup is the process of copying the actual data files to a specified location

• Part of Regular Disaster Recovery Strategy

• After loading or altering a large volume of data

• Before Maintenance Tasks

Backup and Restore – Options

There are several ways to take a Vertica Backup

• Backup and Restore by Database

• Backup and Restore by Schema

• Backup and Restore by Table

Vertica Backup Restore – VBR

vbr.py is a Python script located under /opt/vertica/bin

• Use vbr.py with various options to take backup and restore data

• Create a configuration file

• VBR parameters

Vertica Backup Restore – VBR

A few parameters explained

• Snapshot Name – stores all the files under that named directory

• Restore Points – number of

incremental

backups stored in addition to full backup

• Node

• Backup Directory

VBR preparation

Steps and some prerequisites

• Backup location to be configured on all the nodes

• Verify database is running

• Ensure backup hosts are running if data is backed up to those hosts

• Backup Directory Permissions / Contents

Performing a Backup

How to run the vbr.py script

• vbr.py --task backup --config-file <myconfigfile>

• First run does a

full

backup

• Subsequent runs are

incremental

Performing a Restore

The same vbr.py script is used for restore

• vbr.py --task restore --config-file <myconfigfile>

• Restore can be specific

• Some key features

Copy Vertica Database

This option of VBR copies the entire Database (cluster) to a target cluster

• When do we need

copycluster

?

• Prerequisites

• vbr.py --task copycluster --config-file <cfgfile>

Node Recovery

Vertica is highly available MPP architecture, but nodes may go down…

• Node can recover from failure

• Incremental Recovery

• RAID 10 is best practice

Monitor Recovery

• Monitor disk space

• Monitor Recovery

QUESTIONS?

Please attend our Q&A with HP Big Data experts today

Marina Ballroom, Lobby level

10:15 am • 10:30 am

12:00 pm • 1:00 pm

2:30 pm • 3:00 pm

4:30 pm • 5:00 pm