A sample control script and a Caché resource are provided as part of a development Caché installation.
Note that a development installation is not required in the cluster; copy the sample control script (cache.sh) and the appropriate patch file (cluster.rng.patch or cluster.rng.patchRHEL6) from a development installation to the cluster nodes as needed. Alternatively, contact the InterSystems Worldwide Response Center (WRC) for information about receiving the cache.sh and appropriate cluster.rng.patch* files.
A.3.1 Installing the cache.sh Script
To install the Caché control script:
1. Locate the files in dev/Cache/HAcluster/RedHat/ under the Caché installation directory.
2. Copy or place the cache.sh file in /usr/share/cluster and make sure the permissions, owner and group are identical to the other files in that directory.
A.3.2 Patching the cluster.rng Script
Patches are provided for the cluster.rng file as supplied with RHEL 5.5, 5.6, 5.7, 5.8 or RHEL 6.2 or 6.3. The patch simply adds the definition of the <cache /> resource to the list of cluster resources.
When using RHEL 6, patching is required since a cluster.conf file that includes the <cache /> resource cannot validate without the patched cluster.rng in place. If your system is running a version newer than RHEL 6.3, contact the InterSystems Worldwide Response Center (WRC) for information about options for patching your cluster.rng.
The cluster.rng file is typically found in either the /usr/share/cluster or the /var/lib/cluster directory. Be sure to patch the actual source file (and not simply a linked file).
1. Place the appropriate patch file (either cluster.rng.patch or cluster.rng.patchRHEL6) on the system in /tmp/.
2. Change to the appropriate directory (/var/lib/cluster or /usr/share/cluster) and use the following command to patch the existing cluster.rng file:
patch –b cluster.rng /tmp/cluster.rng.patchRHEL6
Note: cluster.rng.patch patches unaltered RHEL 5.5, 5.6, 5.7 or 5.8 cluster.rng files. cluster.rng.patchRHEL6 patches an unaltered RHEL 6.2 or 6.3 cluster.rng file.
<cache __enforce_timeouts="1" cleanstop="1" name="cacheprod"> <action name="status" interval="30s"/>
<action name="start" timeout="10m"/> <action name="stop" interval="5m"/> </cache>
In this sample, a service starts the cacheprod instance of Caché and waits up to 10 minutes for the resource start to return. The service is marked as failed if the start does not complete in 10 minutes. After successfully starting all the resources, status checks occur every 30 seconds.
If a clean service stop is not configured (cleanstop=”0”), the cacheprod instance is stopped with a ccontrol force
cacheprod quietly. This results in a fast stop and protects against clean stop hangs due to unavailable disk or other Linux
resources, ensuring a fast transition to the other node during faults that do not crash the system but result in a node transition. If cleanstop=”0” is used, system managers should manually stop Caché (ccontrol stop) before any manual service stop or service relocations. This ensures a clean stop during routine maintenance work.
In this example a clean service stop (cleanstop="1") is configured, the cacheprod instance is stopped with a ccontrol
stop cacheprod quietly. If that fails or generates an error, a ccontrol force cacheprod quietly is attempted. Note though,
that a hung ccontrol stop hangs forever.
In this example, however, with __enforce_timeouts=”1”, the stop must complete in five minutes. A stop that does not return within that timeout period causes the whole service to be marked as failed.
Note the following when using the <cache /> resource: 1. The <cache /> resource takes the following parameters:
• name: The name of the Caché instance being controlled by the service.
• __enforce_timeouts: A generic rgmanager setting; as a general rule, it is recommended that you set this = “1” to ensure that a start or stop of the service does not hang indefinitely.
• cleanstop: If false (= “0”) a cluster service stop or relocate immediately forces Caché down rather than attempting a controlled stop of Caché; while the stop is faster, a force of Caché may lengthen startup time as transactions roll back, and so on
2. Action timeouts are honored only if __enforce_timeouts="1". Generic rgmanger start and stop timeouts should be configured based on application needs, as follows:
• <action name="start" timeout="10m"/>: the start timeout should be long enough to allow the replay of the CACHE.WIJ and journal files, but not so long that severe cluster problem notifications are delayed while waiting for a start that will never complete.
• <action name="stop" timeout="5m"/>: the stop timeout comes into play only if the service is stopped (as
opposed to a fencing event that cuts power). This can happen at boot time or as part of cluster reorganization or during a manual service stop.
Note: If the start or stop timeout is reached the service is marked as FAILED and manual intervention is required to reenable it.
• <action name="status" interval="30s"/>: As a general rule, set the status check interval to 30 seconds; more
frequent status checks use CPU and may affect cluster responsiveness.
3. The status check uses ccontrol qlist [instance] to see if the instance is down or running. If it is in either of those states, the status check passes and no action is taken; this lets you stop Caché manually, if necessary, without stopping the service and without accidentally triggering a failover event.
4. The <cache /> resource is always started after the <lvm>, <fs> and <ip> resources, and stopped before those resources; this is without regard to placement within the <service> stanza of the cluster.conf file. For more complex cluster configurations, parent and child relationships may need to be configured, but a typical two-node cluster does not require the added complexity.
78 Caché High Availability Guide Using Red Hat Enterprise Linux Clusters with Caché