Quorum Node IBM GPFS setup - Mixed eX5/X6 DR Clusters

9.2 Mixed eX5/X6 DR Clusters

10.1.6 Quorum Node IBM GPFS setup

Quorum Node IBM GPFS installation on page 108.

General information how to install and setup GPFS can be found online in the Information Center section Installing GPFS on Linux nodes.

8.4.7 Verify Installation

8.4.7.1 GPFS Cluster configuration

• Verify that all nodes are up and running

1 # mmgetstate -a

• Verify distribution of the configuration servers

The primary and secondary GPFS configuration servers must each be on one site. Otherwise, fail-over to the standby site will not work.

This is checked with

1 # mmlscluster

• Verify distribution of quorum nodes

The current active quorum setup can be checked with

1 # mmgetstate -aLs

The cluster configuration is listed with

1 # mmlscluster

When using the tiebreaker node check that the tiebreaker node is a quorum node and that the remaining quorum nodes are distributed evenly among the other file system failure groups. You see the failure groups with

1 # mmlsdisk sapmntdata

Information about the failure group setting can be found in section 8.4.2: GPFS Server configuration on page 76. If not using the tiebreaker make sure that the active site has at least one more quorum node than the passive site. In general, try to keep an odd number of quorum nodes.

• Verify cluster manager location

Verify the location of the cluster manager depending on the use of the tiebreaker node

1 # mmlsmgr

If the solution uses a tiebreaker node, the cluster manager must be on the passive/backup site, in a solution without a tiebreaker node, the cluster manager must be on the active site. To change the cluster manager issue

1 # mmchmgr -c <node>

• Verify replication factor 3 (= three copies, two local and one remote copy)

1 # mmlsfs sapmntdata

Verify that the following values are all set to 3:

1 -m Default number of metadata replicas

2 -M Maximum number of metadata replicas

3 -r Default number of data replicas

4 -R Maximum number of data replicas

• Test replication factor 3

Write a new file to the shared filesystem and verify replication level applied to this file:

1 # mmlsattr <path to file>

All values must be set to 3 and no flags (like illbalanced, metaupdatemiss, etc.) must be shown.

Please check the GPFS documentation or ask IBM GPFS support if there are flags shown after restripe.

• Check failure groups

You should have four failure groups 1,0,x 2,0,x 1,1,x and 2,1,x. If you are using the tiebreaker node, a fifth failure group 3,1,1 should be in the file system. Get the list of failure groups from the disk list

1 # mmlsdisk sapmntdata

Make sure that the server nodes are distributed evenly among the failure groups.

• Disk availability All GPFS disks must be online.

1 # mmlsdisk sapmntdata -e

2 All disks up and ready

If there are disks down or suspended, check the reason (eg. hardware failure, system reboot, ...) and restart them once the problem has been resolved.

The following command will try to start all disks in the file system. This has no effect on already started disks.

1 # mmchdisk sapmntdata start -a

If disks are suspended you can resume them all with the following command:

1 # mmchdisk sapmntdata resume -a Note

Follow the instructions in Section 7: After Installation on page 66.

8.5 Extending a DR-Cluster

This section describes how to grow a DR cluster. Growing a DR enabled cluster requires that both sites grow by the same number of nodes. In general the installation of each active/backup server couple needs not to be done at the same time, but it’s highly recommended. The overcautious technician may also decide to install the backup node prior to the active node.

The following sections will only explain the differences from the basic DR installation in the sections before.

8.6 Mixing eX5/X6 Server in a DR Cluster

Please read chapter 9.2: Mixed eX5/X6 DR Clusters on page 97. Information given there takes precedence over the instructions below.

8.6.1 Hardware Setup

Please refer to 8.3: Hardware Setup on page 71 and follow the instructions there. Ping the new machine on the GPFS network from all machines to test if the network configuration is correct. Ping the new machine on the HANA network from all servers, it is supposed to be reachable only from nodes on the same site.

8.6.2 GPFS Part 1

1. First step is to add/etc/hosts entries on every machine. Let’s assume that the new nodes are the 9th and 10th nodes with node09 going to the active site and 10 into the backup site. Distribute any new nodes evenly into the existing failure groups (topology), so that a failure group has at most one more node than the other, put the backup server into the corresponding FG on the backup site.

In the example above, the 9th node will go into failure group 1 (1,0,x) getting the topology vector 1,0,3 and the 10th node will go into failure group 3 (1,1,x) with topology vector 1,1,3.

On all existing nodes, add host entries for the the GPFS network, .e.g.:

1 192.168.1.109 gpfsnode09

2 192.168.1.110 gpfsnode10

On the new nodes add entries for all other nodes. Copying the entries from one of the existing nodes is the easiest way.

First add host keys for the new nodes to the existing machines. Run on any existing node

1 # for srcnode in gpfsnode0{1..8} ; do echo node $srcnode ; ssh $srcnode 'for ←-,→target in gpfsnode0{9,10} ; do echo -n $target ; ssh-keygen -R $target ; ←-,→ssh-keyscan -t rsa target >> /root/.ssh/known_hosts ; done '; done

The value gpfsnode01..8 will generate a list from gpfsnode01 to gpfsnode08, if the host names differ or are not consecutive, replace this with a space separated list of host names. The same applies to gpfsnode09,10 which are the new nodes in this example.

Then copy the root SSH key to the new news. Issue these command on one of the existing cluster nodes:

1 # scp /root/.ssh/authorized_keys /root/.ssh/id_rsa /root/.ssh/id_rsa.pub ←-,→root@gpfsnode09:/root/.ssh/

2 # scp /root/.ssh/authorized_keys /root/.ssh/id_rsa /root/.ssh/id_rsa.pub ←-,→root@gpfsnode10:/root/.ssh/

On all new cluster nodes run this command

1 # for node in gpfsnode{01..10} ; do echo -n $node ; ssh-keygen -R $node ; ssh-←-,→keyscan -t rsa $node >> /root/.ssh/known_hosts ; done

Test the SSH key exchange by runnign this command on any node

1 # for srcnode in gpfsnode{01..10} ; do echo from node $srcgpfsnode ; ssh ←-,→$srcnode 'for target in gpsfnode{01..10} ; do echo To node $target ; ssh ←-,→$target hostname ; done '; done

The command should run without interaction and errors.

2. Install GPFS (base package):

1 # cd /var/tmp/install/gpfs-<GPFS-RELEASE>

2 # rpm -ivh gpfs.base-<GPFS-RELEASE>-0.x86_64.rpm 3. Update to the latest GPFS Maintenance Release

Warning

It is highly recommended to upgrade to GPFS 3.5.0-17 or higher.

Install the following three packages for the latest (X) maintenance release:

1 # rpm -ivh gpfs.docs-<GPFS-RELEASE>-X.noarch.rpm

2 # rpm -ivh gpfs.gpl-<GPFS-RELEASE>-X.noarch.rpm

3 # rpm -ivh gpfs.msg.en_US-<GPFS-RELEASE>-X.noarch.rpm 4. Verify your GPFS installation:

1 # rpm -qa | grep gpfs

The installed packaged from above should be listed here.

5. Build the GPFS Portability Layer

Follow the instructions in/usr/lpp/mmfs/src/README:

1 # cd /usr/lpp/mmfs/src

2 # make Autoconfig

3 # make World

4 # make InstallImages

6. To add the new nodes to the cluster run on any running node

1 # mmaddnode -N gpfsnode09,gpfsnode10 7. Mark the servers as licensed:

1 # mmchlicense fpo --accept -N gpfsnode09,gpfsnode10

Please use the correct licensed for the nodes. Server and FPO are just examples.

8. Start the new nodes

1 # mmstartup -N gpfsnode09,gpfsnode10

9. Create the disk descriptor files. Before adding the disks to the shared file system, you must create the disk descriptor or stanza files. You can create them on any node on the cluster, but it is preferably done on the node where the files for the initial cluster creation are located. Please see chapter 8.4.3: GPFS Disk configuration on page 77 for a description of the stanza files. You only need to create entries for the drives on the new nodes and you can omit the pool configuration entries. Let us assume the new file is/var/mmfs/config/disk.list.data.gpfsnode0910.

10. Create NSDs

1 # mmcrnsd -F /var/mmfs/config/disk.list.data.gpfsnode0910

8.6.3 HANA Backup Node Installation

Skip this for a node on the active site. For the HANA installation on the backup site, we need a temporary filesystem which must satisfy some requirements. RAM based filesystems are not sufficient, so we use the fresh created NSDs for a temporary filesystem, install the backup instance, and destroy the temporary filesystem afterwards before continuing with the installation.

1. Create a temporary filesystem

1 /usr/lpp/mmfs/bin/mmcrfs sapmnttmp -F

/var/mmfs/config/disk.list.data.←-,→gpfsnode0910 -A no -B 1M -N 3000000 -v no -m 1 -M 3 -r 1 -R 3 j hcluster ←-,→--write-affinity-depth 1 -s failureGroupRoundRobin --block-group-factor 1←-,→ -Q yes

Before continuing with the installation make sure that the GPFS file system sapmntdata is not mounted at/sapmnt on the new nodes.

Mount this filesystem on all new backup nodes

1 mmmount sapmnttmp /sapmnt -N <new backup nodes>

2. Install HANA on backup site

In order to prepare the backup site, it is necessary to do a standard HANA installation and then delete the installed content on the shared filesystem. A tool to automate this procedure is currently in development by SAP.

Install SAP HANA on the backup site as described in the official SAP documentation available here: http://help.sap.com/hana_appliance. The location of the SAP HANA installation files is /var/tmp/saphana. Do a single node installation on each node. Make sure to use exact the same SAP SID, SAP instance number, user names, user IDs, group names and group IDs, paths as in the original DR-HANA installation. You can use the command id to query user and group information.

3. Stop HANA and SAP Host agent on backup site

1 $ HDB stop

Then log in as root and stop SAP Host agent and other services:

1 # /etc/init.d/sapinit stop

Afterwards disable the autostart of the sapinit service

1 # chkconfig sapinit off

Do the last two steps on all backup nodes.

4. Delete SAP HANA shared content

5. Disable mmfsup script on backup site nodes An installation with the Recovery Image will install a mmfsup script which will automatically start SAP HANA after the file system comes up. This must be deactivated as it may start SAP HANA on both sites (using the same hostnames.) The script resides in/var/mmfs/etc. Remove it on all cluster nodes.

1 # rm /var/mmfs/etc/mmfsup

6. Delete temporary filesystem After installing all new backup nodes, unmount temporary Filesystem on all nodes

1 mmmumount sapmnttmp -a and delete it

1 mmdelfs sapmnttmp

This will delete all shared HANA content and will leave the node specific HANA parts installed.

8.6.4 GPFS Part 2

1. Add disks to sapmntdata filesystem

1 # mmadddisk sapmntdata -F /var/mmfs/config/disk.list.data.gpfsnode0910 2. Verify NSD status

Verify that all NSDs are up and running

1 # mmlsdisk sapmntdata 3. Mount GPFS on active

On the new active nodes and only on these, mount the GPFS file system

1 # mmmount sapmntdata -N gpfsnode09,gpfsnode10 GPFS setup is now complete.

8.6.5 HANA

8.6.5.1 Install HANA on active site

1. Please make sure that you have mounted the shared file system on the new nodes.

1 # mmlsmount sapmntdata -L

2. If not already installed, install the SAP host agent

1 # cd /var/tmp/install/saphana/DATA_UNITS/SAP_HOST_AGENT_LINUX_X64

2 # rpm -ihv saphostagent.rpm

As recommended by the RPM installation, a password for sapadm may be set.

3. Deactivate automatic startup through sapinit at startup.

Running SAP’s startup script during system boot must be deactivated as it will will be executed by a GPFS startup script after cluster start. Execute:

1 # chkconfig sapinit off

4. Install SAP HANA worker and standby nodes as described in the guide "SAP HANA Administration Guide".

Warning

SAP HANA in this DR solution must be installed using the hostname of the HANA-internal network (usually on bond1, hostname hananodeXX). The host based routing used in the HA solution is not applicable for the DR solution.

8.7 Using Non Productive Instances on Inactive DR Site

IBM supports the installation of storage expansions in a DR scenario to allow clients to run a non-productive SAP HANA instance on idling DR-site nodes. During normal operation in a DR scenario, all nodes at one of the two sites are only receiving data from the active site and store them on their local disks.

SAP is tolerating to run a non-productive SAP HANA instance on those nodes. The local disks of the nodes are used for production data. A storage expansion is used to provide enough local storage for those non-productive instances.

In the event of a disaster, when the backup site becomes the active site, all non-productive SAP HANA instances have to be shut down to allow production to continue to run.

8.7.1 Architecture

This section briefly explains how IBM enables the use of idling DR-site nodes to run non-productive SAP HANA instances.

8.7.1.1 Prerequisites The use of a storage expansion is only supported in a DR scenario. No expansions can be used when running in an HA environment unless being part of the certified server models.

All nodes on the DR-site must have a storage expansion connected. Having only a subset of the DR-site nodes equipped with storage expansions is not a supported environment. Furthermore, all expansions must have identical disk drives installed.

If the customer considers both participating data centers to be equal (which means that after a fail-over of his production instances to the DR-site he will not manually fail production back to his site A data center), then you must have storage expansion connected also to all primary site nodes. This storage expansion will remain unused until you actually need to move data away from DR-site nodes which are now being used to host SAP HANA production instances.

8.7.1.2 Architectural overview The following illustration shows you how IBM’s solution for SAP HANA DR with storage expansions looks like:

The expansion storage is visible as local storage only and connected via the SAS interface. The storage is not shared by multiple nodes.

Site A

node1 node2 node3 node5 node6 node7

HDD HDD HDD HDD HDD HDD

node8

meta data fio fio fiofio fio fio

sda1

RAID Ctrl RAID Ctrl RAID Ctrl RAID Ctrl

... ... ... ...

Second file system spanning only expansion box drives (metadata and data) Produc-

Figure 30: SAP HANA DR using storage expansion - architectural overview

Attention

The external storage can only be used to host data of non-productive SAP HANA instances.

The storage must not be used to expand space of the production file system or to store backups.

8.7.1.3 Architectural comments IBM only support running GPFS with a replication factor of 2 for the non-productive instance. This means, outages of a single node can be handled and no data is lost. We do not support a replication factor of 3 because the scope of non-productive SAP HANA environments does not include disaster recovery.

There will be exactly one new file system spanning all DR-site expansion box drives. While we do not support a multi SID configuration it is a valid scenario to run, e.g., on some DR-site nodes a QA environment and on other DR-site nodes development. This, however, has to be done on the same file system.

IBM does not enable quotas on the new expansion box file system. Make sure to have either a valid backup procedure in place or to regularly delete old backups.

8.7.2 Setup

This section assumes that the nodes have been successfully installed with an operating system already (as required for a backup DR site).

8.7.2.1 Hardware setup Connect the EXP2524 SAS port labeled ’In’ to one of the M5120 or M5225 ports. For details, see the EXP2524 Installation Guide. Configure the drives as described in the section 6: Guided Install of the Lenovo Solution on page 41. Either reboot or rescan the SCSI bus and verify that Linux recognizes the new drives.

8.7.2.2 GPFS configuration You reuse the existing GPFS cluster and create a second file system spanning only the expansion drives of the DR-site nodes.

Even if your setup includes expansions on the primary site, execute the procedure only on the DR-site expansions. The primary site expansion drives will not be used in the beginning.

1. On each DR-site node, collect the device names of all expansion drives. When using the M5225 Controller you can get the drive names with the this command:

1 # lsscsi |grep "M5225" |grep -o -E "/dev/sd[a-z]+"

or execute following command in case M5120 Controller is used:

1 # lsscsi |grep "M5120" |grep -o -E "/dev/sd[a-z]+"

You will end up with something like:

1 /dev/sde

2 /dev/sdf

3 /dev/sdg

4 /dev/sdh

for each of DR-site node. Note: After sdz, Linux wraps around and continues with sdaa, sdab, ...

2. Create additional NSDs

For all new expansion drives, create NSDs according to the following rules:

(a) all NSDs will be dataAndMetadata (b) all NSDs go into the system pool

(d) One failure group for all drives within one expansion box

Example: three M-size nodes with 32-drive expansion (gpfsnode01-03 are primary site nodes, 04-06 are secondary site/DR-site nodes) Store as/tmp/nsdlistexp.txt. Then create NSDs using those disks

1 # mmcrnsd -F /tmp/nsdlistexp.txt 3. Create file system

1 # mmcrfs /dev/sapmntext -F /tmp/nsdlistexp.txt -A no -B 512k -N 3000000 -v no -←-,→m 2 -M 2 -r 2 -R 2 -j hcluster --write-affinity-depth 1 -s

←-,→failureGroupRoundRobin --block-group-factor=1 -T /sapmntext

Warning

Be sure to usensdlistexp.txt and not your list with internal drives! Using the wrong drives can destroy your production data!

4. Mount file system on DR-site nodes only.

1 # mmmount sapmntext -N [list of DR-site nodes]

5. Install SAP HANA worker and standby nodes as described in the guide "SAP HANA Administration Guide". Take care to install HANA on /sapmntext and not on /sapmnt.

Also take care that you don’t use the UID (user id) and GID (group id) of the DR HANA instance especially when installing non-productive HANA instances before installing the DR instance.

If you have expansion boxes connected also to your primary site nodes, they get activated only when you need to migrate non-productive SAP HANA instances’ data away from DR-site notes. See the Lenovo SAP HANA Appliance Operations Guide¹⁶ for details.

When configuring a clustered configuration by hand, install SAP HANA worker and standby nodes as described in the guide "SAP HANA Administration Guide".

16SAP Note1650046(SAP Service Marketplace ID required)

9 Mixed eX5/X6 Environments

9.1 Mixed eX5/X6 HA Clusters

Attention

This chapter only applies to hybrid clusters consisting of servers with Intel Westmere, and Intel Ivy Bridge CPUs.

Hybrid clusters with a mix of Intel Westmere, and Intel Haswell CPUs must not be installed!

9.1.1 Definition & Overview

A mixed eX5/X6 cluster is a System x Solution for SAP HANA cluster consisting of eX5 based servers (Intel Westmere, MT 7143 and 7147) and X6 based servers (Intel Ivybridge, MT 3837 and 6241). Another term used is "hybrid cluster". Due to the new storage layout for X6-only installations, an X6 configuration must be slightly modified before an X6 node can be added to an eX5 cluster. Such an X6 node is considered to be configured in legacy or compatibility mode.

Besides the different storage layout, there are some minor configuration changes between the older

In document Implementation Guide X6 1.9.96 13 (Page 96-123)