The main purpose of an HA cluster is to manage user services. Typical examples of user services are an Apache Web server or a database. From the user's point of view, the services do something specific when ordered to do so. To the cluster, however, they are only resources which may be started or stopped—the nature of the service is irrelevant to the cluster.
In this chapter, we will introduce some basic concepts you need to know when configuring resources and administering your cluster. The following chapters show you how to execute the main configuration and administration tasks with each of the management tools the High Availability Extension provides.
5.1 Global Cluster Options
Global cluster options control how the cluster behaves when confronted with certain situations.
They are grouped into sets and can be viewed and modified with the cluster management tools like Hawk and the crm shell.
5.1.1 Overview
For an overview of all global cluster options and their default values, see Pacemaker Explained, available from http://www.clusterlabs.org/doc/. Refer to section Available Cluster Options.
The predefined values can usually be kept. However, to make key functions of your cluster work correctly, you need to adjust the following parameters after basic cluster setup:
Option no-quorum-policy Option stonith-enabled
Learn how to adjust those parameters with the cluster management tools of your choice:
Hawk: Procedure 6.3, “Modifying Global Cluster Options”
crmsh: Section 8.3, “Configuring Global Cluster Options”
61 Option no-quorum-policy SLE HA 12 SP1
5.1.2 Option no-quorum-policy
This global option defines what to do when a cluster partition does not have quorum (no majority of nodes is part of the partition).
Allowed values are:
ignore
The quorum state does not influence the cluster behavior; resource management is continued.
This setting is useful for the following scenarios:
Two-node clusters: Since a single node failure would always result in a loss of majority, usually you want the cluster to carry on regardless. Resource integrity is ensured using fencing, which also prevents split brain scenarios.
Resource-driven clusters: For local clusters with redundant communication channels, a split brain scenario only has a certain probability. Thus, a loss of communication with a node most likely indicates that the node has crashed, and that the surviving nodes should recover and start serving the resources again.
If no-quorum-policy is set to ignore, a 4-node cluster can sustain concurrent failure of three nodes before service is lost. With the other settings, it would lose quorum after concurrent failure of two nodes.
freeze
If quorum is lost, the cluster partition freezes. Resource management is continued: running resources are not stopped (but possibly restarted in response to monitor events), but no further resources are started within the affected partition.
This setting is recommended for clusters where certain resources depend on communication with other nodes (for example, OCFS2 mounts). In this case, the default setting no-quorum-policy=stop is not useful, as it would lead to the following scenario:
Stopping those resources would not be possible while the peer nodes are unreachable.
Instead, an attempt to stop them would eventually time out and cause a stop failure, triggering escalated recovery and fencing.
stop (default value)
If quorum is lost, all resources in the affected cluster partition are stopped in an orderly fashion.
suicide
If quorum is lost, all nodes in the affected cluster partition are fenced.
62 Option stonith-enabled SLE HA 12 SP1
5.1.3 Option stonith-enabled
This global option defines if to apply fencing, allowing STONITH devices to shoot failed nodes and nodes with resources that cannot be stopped. By default, this global option is set to true, because for normal cluster operation it is necessary to use STONITH devices. According to the default value, the cluster will refuse to start any resources if no STONITH resources have been defined.
If you need to disable fencing for any reasons, set stonith-enabled to false, but be aware that this has impact on the support status for your product. Furthermore, with stonith-enabled="false", resources like the Distributed Lock Manager (DLM) and all services depending on DLM (such as cLVM2, GFS2, and OCFS2) will fail to start.
Important: No Support Without STONITH
A cluster without STONITH is not supported.
5.2 Cluster Resources
As a cluster administrator, you need to create cluster resources for every resource or application you run on servers in your cluster. Cluster resources can include Web sites, e-mail servers, databases, file systems, virtual machines, and any other server-based applications or services you want to make available to users at all times.
5.2.1 Resource Management
Before you can use a resource in the cluster, it must be set up. For example, if you want to use an Apache server as a cluster resource, set up the Apache server first and complete the Apache configuration before starting the respective resource in your cluster.
If a resource has specific environment requirements, make sure they are present and identical on all cluster nodes. This kind of configuration is not managed by the High Availability Extension.
You must do this yourself.
63 Supported Resource Agent Classes SLE HA 12 SP1