Consider the following for your applications:
• Ensure that all network ports required for interfaces, user connectivity and monitoring are open on all nodes in the cluster.
• Connect all interfaces, web servers, ECP clients and users to the database using the VIP over the public network as configured in the main.cf file.
• Ensure that application daemons, Ensemble productions, and so on are set to autostart so the application is fully available to users after unscheduled failovers.
• Consider carefully any code that is part of %ZSTART or otherwise occurs as part of Caché startup. To minimize recovery time, do not place heavy cleanup or query code in the startup that will cause the resource start action to time out before the custom code completes.
• Other applications, web servers, and so on can also be configured in the cluster, but these examples assume only Caché is installed under cluster control. Contact the InterSystems Worldwide Response Center (WRC) to consult about cus-tomizing your cluster.
• If any Caché instance that is part of a failover cluster is to be added to a Caché mirror (see the “Mirroring” chapter of this guide), please contact the InterSystems Worldwide Response Center (WRC) to consult on additional steps required to insure that the system’s ISCAgent is properly configured. Essentially, the ISCAgent daemon needs to be installed on the other node as well, as follows:
cd /etc/init.d/
rsync -av -e ssh ISCAgent root@node2:/etc/init.d/
B.7 Testing and Maintenance
Upon first setting up the cluster, be sure to test that failover works as planned. This also applies any time changes are made to the operating system, its installed packages, the disk, the network, Caché, or your application.
In addition to the topics described in this section, you should contact the InterSystems Worldwide Response Center (WRC) for assistance when planning and configuring RHEL HA cluster to control Caché. The WRC can check for any updates to the Caché agent, as well as discussing failover and HA strategies.
B.7.1 Failure Testing
Typical full scale testing must go beyond a controlled service relocation. While service relocation testing is necessary to validate that the package configuration and the service scripts are all functioning properly, you should also test responses to simulated failures. Be sure to test failures such as:
• Loss of public and private network connectivity to the active node
• Loss of disk connectivity
• Hard crash of active node
Testing should include a simulated or real application load. Testing with an application load builds confidence that the application will recover in the event of actual failure.
If possible, test with a heavy disk write load; during heavy disk writes the database is at its most vulnerable. Caché handles all recovery automatically using its CACHE.WIJ and journal files, but testing a crash during an active disk write ensures
B.7.2 Software and Firmware Updates
Keep software patches and firmware revisions up to date. Avoid known problems by adhering to a patch and update schedule.
B.7.3 Monitor Logs
Keep an eye on the /var/log/pacemaker.log file and messages file in /var/log/ as well as the Caché cconsole.log files. The Caché agent resource script logs time-stamped information to the logs during cluster events.
Use the Caché console log, the Caché Monitor and the Caché System Monitor to be alerted to problems with the database that may not be caught by the cluster software. (See the chapters “Monitoring Caché Using the Management Portal” ,
“Using the Caché Monitor” and “Using the Caché System Monitor” in the Caché Monitoring Guide for information about these tools.)
C
Using IBM PowerHA SystemMirror with Caché
Caché can be configured as a resource controlled by IBM PowerHA SystemMirror. This appendix highlights the key portions of the configuration of PowerHA including how to incorporate the custom Caché application controller and monitor script.
Refer to your IBM documentation and consult with IBM on all cluster configurations.
When using Caché in a high availability environment controlled by IBM PowerHA SystemMirror:
1. Install the hardware and operating system according to your vendor recommendations for high availability, scalability and performance; for more information, see Hardware Configuration.
2. Configure IBM PowerHA SystemMirror with shared disk and virtual IP. Verify that common failures are detected and the cluster continues operating; for more information, see IBM PowerHA SystemMirror Configuration.
3. Install Caché according to the guidelines in this appendix and verify connectivity to your application via the virtual IP; for more information, see Install Caché in the Cluster.
4. Test disk failures, network failures, and system crashes. Test and understand your application’s response to such failures;
for more information, see Test and Maintenance.
C.1 Hardware Configuration
Configure the hardware according to best practices for the application. In addition to adhering to the recommendations of IBM and your hardware vendor, consider the following:
• Disk and Storage — Create LUNs/partitions, as required, for performance, scalability, availability and reliability.
This includes using appropriate RAID levels, battery-backed and mirrored disk controller cache, multiple paths to the disk from each node of the cluster, and a partition on fast shared storage for the PowerHA cluster repository disk.
• Networks/IP Addresses — Use bonded multi-NIC connections through redundant switches/routers where possible to reduce single-points-of-failure.