Special Considerations - Test and Maintenance

C.4 Test and Maintenance

D.3.3 Special Considerations

Consider the following for your applications:

• Ensure that ALL disks required for the proper operation of your application are part of the cachepkg.conf file.

• Ensure that all network ports required for interfaces, user connectivity, and monitoring are open on all nodes in the cluster.

• Connect all interfaces, web servers, ECP clients and users to the database using the virtual IP address over the public network as configured in the cachepkg.conf file.

• Ensure that the application daemons, Ensemble productions, etc. are set to auto-start so the application is fully available to users after unscheduled failovers.

• Consider carefully any code that is part of %ZSTART or otherwise occurs as part of the Caché startup. To minimize time-to-recovery do not place heavy cleanup or query code in the startup.

• By default, the cache.sh script is configured to use ccontrol force whenever the script is called with the stop parameter; this results in a very fast stop and no hangs, or waits for processes to quit. Caché rolls back transactions during the next Caché start. This is equivalent to a hard crash and recovery on a single node. If the application running in Caché is prone to long transactions, the default behavior can be changed.

To configure cache.sh to try and wait for a ccontrol stop before a force, edit cache.sh and set cleanstop=1. Note: cleanstop=1 can result in a longer time to recover after unplanned failovers.

In any case, administrators performing planned service relocations should begin with a controlled halt of Caché using ccontrol stop. Then, after a clean and successful stop, they can continue with the service relocation.

D.4 Test and Maintenance

Upon first setting up the cluster, be sure to test that failover works as planned. Any time changes are made to the operating system, its installed packages, the disk, the network, Caché, or your application, be sure to test that failover continues to work as expected.

In addition to the topics described in this section, you should contact the InterSystems Worldwide Response Center (WRC) for assistance when planning and configuring your HP Serviceguard service to control Caché. The WRC can check for any updates to the cache.sh script, as well as discuss failover and HA strategies.

Typical Full Scale Testing Must Go Beyond a Controlled Service Relocation

While service relocation testing is necessary to validate that the package configuration and service scripts are all functioning properly, be sure to also test response to simulated failures.

Be sure to test failures such as:

• Loss of public and private network connectivity from the active node.

• Loss of disk connectivity.

• Hard crash of active node.

Testing should include a simulated or real application load, as follows:

• Testing with a load builds confidence that the application will recover.

• Try to test with a heavy disk write load. During heavy disk writes the database is at its most vulnerable. Caché handles all recovery automatically using its CACHE.WIJ and journal files, but testing a crash during an active disk write ensures that all file system and disk devices are properly failing over.

Keep Patches and Firmware Up to Date

Avoid known problems by adhering to a patch and update schedule.

Use Caché Monitoring Tools

Use the Caché console log, the Caché Monitor and the Caché System Monitor to be alerted to problems with the database that may not be caught by the cluster software. (See the chapters “Monitoring Caché Using the Management Portal” ,

“Using the Caché Monitor” and “Using the Caché System Monitor” in the Caché Monitoring Guide for information about these tools.)

E

Using Veritas Cluster Server for Linux with Caché

Caché can be configured as an application controlled by Veritas Cluster Server (VCS) on Linux. This appendix highlights the key portions of the configuration of VCS including how to incorporate the Caché high availability agent into the controlled service. Refer to your Veritas documentation and consult with your hardware and operating system vendor(s) on all cluster configurations.

When using Caché in a high availability environment controlled by Veritas Cluster Server:

1. Install the hardware and operating system according to your vendor recommendations for high availability, scalability and performance; see Hardware Configuration.

2. Configure VCS with shared disks and a virtual IP (VIP). Verify that common failures are detected and the cluster continues operating; see Linux and Veritas Cluster Server.

3. Install the VCS control scripts (^online, ^offline, ^clean, ^monitor) and the Caché agent type definition, see Installing the VCS Caché Agent.

4. Install Caché and your application according to the guidelines in this appendix and verify connectivity to your application through the VIP; see Installing Caché in the Cluster.

5. Test disk failures, network failures, and system crashes, and test and understand your application’s response to such failures; see Application Considerations and Testing and Maintenance.

E.1 Hardware Configuration

Configure the hardware according to best practices for your application. In addition to adhering to the recommendations of your hardware vendor, consider the following:

Disk and Storage

Create LUNs/partitions, as required, for performance, scalability, availability and reliability. This includes using appropriate RAID levels, battery-backed and mirrored disk controller cache, multiple paths to the disk from each node of the cluster, and a partition on fast shared storage for the cluster quorum disk.

Networks/IP Addresses

Where possible, use bonded multi-NIC connections through redundant switches/routers to reduce single-points-of-failure.

In document Caché High Availability Guide (Page 137-140)