• No results found

Create a System Monitor Program

In document Remote Software Facility (Page 56-66)

SDLC LAN and WAN

Chapter 5: High Availability and Replication Overview

12. Create a System Monitor Program

With replication up and running, you should check periodically to be sure things are operating smoothly. Better yet, have RSF monitor your system for you!

With a System Monitor Program, RSF checks critical system functions and sends an email or text message if anything needs attention. Among other things, you can have RSF check for:

● Replication tasks that are currently in error. ● Excessive journal replication lag.

● New job logs since you last checked, filtered per your specifications. ● New system messages since you last checked, filtered per your specifications. ● CPU or disk utilization above some critical threshold.

You can also define your own conditions and have RSF monitor those as well.

Source for model System Monitor Program is provided in source member CHKCDN in RSFTOOLS/QCLSRC. Create your own program by copying the model to QCLSRC in library RSFUSER and modifying it there.

When your program is ready, use the system job scheduler to run your monitor program two or more times per day.

Things to Consider

Initial Library Copy

When you first synchronize a library or IFS directory, RSF will copy the entire library or directory to the target to ensure the versions are identical. All source and target objects must begin with identical ownership and attributes, as well as identical data.

If you know the libraries are identical or you would like to copy the library manually, use the FULLSAVE(*MANUAL) option the first time you synchronize the library with the SYNCLIBRSF command. This tells RSF to skip the initial copy. Only changes made after this point are synchronized. You will need to press F10 to see the FULLSAVE parameter.

Please note: If journaling is used for replication and you manually create the initial save image that will be used to restore an identical copy to the target machine, you must run the SYNCLIBRSF command with the FULLSAVE(*MANUAL) option twice: once before saving the library with SAVACT(*LIB), and again after the save-while-active checkpoint has been reached (local journaling) or after the library has been restored onto the target (remote journaling.)

The first manual sync point ensures that journaling is properly initialized for all mirrored objects. You should specify FULLSAVE (*MANUAL *BEGIN) for the first manual sync point.

The second manual sync point tells RSF to begin synchronizing changes at the appropriate point. You should specify FULLSAVE (*MANUAL *END) for the second manual sync point.

In all cased, do not begin the actual synchronization, using the default FULLSAVE(*AUTO) option, until the initial copy of the library has been restored to the target.

See the online help text for the FULLSAVE parameter on the Synchronize Libraries (SYNCLIBRSF) command for more information.

Logicals and Physicals in Different Libraries

If you store logical files in a different library from the physical files they're based on, you have a cross-library dependency. This is not the recommend way to organize your data because it is more complicated to manage.

Tip: Use the Display Cross-Lib Dependencies (DSPDBRRSF) command in library RSFTOOLS to check for cross-library dependencies. Special care must be taken when starting synchronization for libraries with cross-library dependencies. You must be sure to synchronize all libraries in the interdependent set and you must synchronize the library containing the physical file before the library containing cross-dependent logicals.

For example: Say LIB_P contains physical and logical files and LIB_L contains (among other things) a logical file that is based on a physical file in LIB_P. You must initially synchronize LIB_P before synchronizing LIB_L.

Note that when LIB_P is synchronized for the first time, RSF will send the entire library to establish a clean synchronization boundary. In the process, RSF will want to clear LIB_P on the target machine before restoring the copy saved from production. The clear will fail because of the logical in LIB_L that is based on the physical file in LIB_P and inquiry message RSF3136 will be sent to the system operators message queue on the target machine. The text for message RSF3136 is

Message . . . . : Unable to clear library &1. (C R I)

Cause . . . : An RSF synchronization job needs to clear library &1 on this machine before it can continue but one or more objects in the library can not be deleted.

Recovery . . . : Do one of the following: -- Type C to cancel the synchronization job.

job continue.

-- Type I to ignore the clear operation and let the synchronization job try to continue without clearing the library.

To resolve the problem, manually delete the logical files in LIB_L that are based on physicals in LIB_P and then answer the message with an 'R', allowing the replication of LIB_P to continue.

The logicals deleted from LIB_L will be replaced during the first synchronization cycle for LIB_L. (This is why the first sync cycle for LIB_L must be run after the first sync cycle for LIB_P completes. Once the first sync cycle completes for both libraries, the libraries may be synchronized in any order.)

Initial Journaled Object Copy

When a value other than *NONE is specified for Journal on the Change Library Sync Attributes (CHGRSFSA) command, physical files, data areas and data queues will be journaled. You can start journaling yourself but we recommend that you let RSF start journaling for the appropriate objects automatically.

If RSF starts journaling for an object, it will automatically send the entire object to the target to ensure that both copies are in sync. If you know the objects are identical and you would like to skip this step, use the FULLSAVE(*MANUAL) option as described above. If you choose to start journaling yourself before beginning library synchronization, be sure to use the correct journal and to capture before and after images if updating by key is desired.

When to Journal

When replicating a library that contains large database files or one which will be in use on the target machine, we recommend synchronizing at the record level by specifying a value other than *NONE for Journal on the Change Library Sync Attributes (CHGRSFSA) command. If two-way synchronization is planned, we recommend specifying *KEY for the "Apply journal entries by" (APPLY) parameter as well.

When replicating an IFS directory that contains very large files or files which may be locked and unable to be saved on a regular basis, we recommend synchronizing at the byte level by specifying a value other than *NONE for Journal on the Change IFS Sync Attributes (CHGRSFISA) command.

When not to Journal

When replicating a library containing source files, journaling is not desirable because the source members are updated wholesale by source editors. For these libraries, we recommend specifying *NONE for Journal on the Change Library Sync Attributes (CHGRSFSA) command.

You can specify a synchronization journal for any library and the library will be synchronized properly. However, it is more efficient if you specify *NONE for Journal on the Change Library Sync Attributes (CHGRSFSA) command when

● The library contains no database files, data areas or data queues, or ● The library contains only source files and other non-file objects

Remote Journaling

In general, you can use remote journaling any time journaling is used to replicate a library or IFS directory. Advantages of remote journaling:

● The journal changes are sent more quickly and efficiently.

● The journal changes are sent continuously and applied continuously. At synchronization intervals, RSF checks the progress of the journal apply process but does not need to send the journal entries.

Disadvantages of remote journaling:

● All journal entries are sent, even those that are not needed for replication.

● Filtering is not supported. *NO is assumed for the "Filter Journal Entries" (FLTJRN) parameter.

Note: Remote journaling uses TCP ports 446 (DRDA), and 447 (DDM) and 3777 (remote journaling proper). If you intend to use remote journaling and your source and target machines are separated by a firewall, be sure to open ports 446, 447 and 3777.

In addition, you should use the Change DDM TCP/IP Attributes (CHGDDMTCPA) command on the target machine to change the "Password Required" option to *NO, as the *YES option does not work in most environments.

Triggers and Library List Considerations

When triggers are associated with your database files, these triggers may be invoked on the target as well as on the source system. You must ensure that the trigger programs exist on the target. In addition, the RSF job that applies journal changes on the target must be able to find any trigger programs. If necessary, add libraries containing trigger programs to the Initial Library List parameter of job descriptions RSF/RSFTCP2 and RSF/RSF. You can use the Print Trigger Programs (PRTTRGPGM) command on the source machine to help identify triggers.

Please note: If your trigger programs update files that are being replicated by RSF, there is no need to run these programs on the target machine. In this case, we recommend suspending your trigger programs on the target machine with the Change PF Trigger (CHGPFTRG) command, specifying *DISABLED for "Trigger state." The command Change PF Triggers (CHGTRGRSF) is included with RSF to allow you to easily change the trigger state of many physical files at once. In addition, you should specify *YES for the "Disable triggers on target" (DISTRG) parameter on the Change Library Sync Attributes (CHGRSFSA) command for any library that contains files with triggers that should be disabled on the target machine.

Multiple Journals Per Library

It's simpler if all database files, data areas and data queues in a given library are journaled to the same journal. However, if you have a need to use multiple journals for a single library, RSF can still replicate all of the changes. To accomplish this

● Create multiple RSF server IDs that point to the same target machine.

● Use the Change Library Sync Attributes (CHGRSFSA) command to define replication with journaling for each "from library", "to library" and "to server ID" combination. With multiple journals per library, specify the same "from library" and "to library" but a different server ID for each journal.

For each Synchronization Attributes specification, exclude the files that are journaled by the other journal. For all but one of the Synchronization Attributes specifications, exclude all non-journaled object types (all objects other than physical files and data areas) to avoid replicating those objects more than once.

● Use multiple instances of the Synchronize Libraries (SYNCLIBRSF) command to replicate the "from" and "to" library to each logical server ID, thus replicating all of the changes from all of the journals used with the library.

System 36 Environment

If you are running the System 36 environment on your systems:

Be sure that the job used to submit replication tasks is not running in the S/36 environment. One way to do this is to specify *NONE for "Special Environment" on the user profile that will be used to submit replication jobs.

● Be sure that user profile RSFSRV has *NONE specified for "Special Environment" on both the production and backup machines.

● When replicating #LIBRARY, omit objects of type *S36. When #LIBRARY is replicated for the first time, the library on the backup machine may not be able to be cleared completely if it contains an *S36 object. If inquiry message RSF3136 is sent and the *S36 object is the only one that could not be cleared, take option "I" to ignore the message and allow the replication to continue.

Is Everything OK?

How can you tell if the synchronization process is doing its job, without errors?

The best way is to use a System Monitor Program to notify you by email or text message if there's a problem. See here for more info. You can also do the following: On the production (source) machine:

● Use the Work With Sync Attributes (WRKRSFSA) display. The last synchronization date and error date for each item can be seen at a glance. You can also display error logs for any item.

● To make doubly sure, you can use the Check Libraries (CHKLIBRSF) or Check IFS Directory (CHKDIRRSF) to compare the contents of libraries or directories. You can also access these functions from the Work With Sync Attributes display by keying option 16 beside any entry.

Look for RSF job logs. Select option 45 from the RSFHA menu. On the backup (target) machine:

Look for history log messages. Select option 46 from the RSFHAT menu. Look for RSF job logs. Select option 45 from the RSFHAT menu.

● When an error occurs applying journal entries on the target machine, the save file containing the transmitted journal entries is renamed to Jdddddhhmm, where ddddd is the Julian date and hhmm is the time in hours and minutes. The library containing the file is determined by the RSF requester entry on the target machine (WRKRSFRDE), but the default library is QGPL. A save file renamed in this way is not deleted by RSF; these must be deleted manually.

Remember: RSF repairs most synchronization errors automatically, within a cycle or two. If you see an error that appears to go away after a few cycles, it has most likely been fixed already by RSF.

Role Swap

Overview

At any given time in a typical two-machine High Availability environment, one machine acts as the source or production machine and the other acts as the target or backup machine. Users interact only with the production machine. Replication keeps the backup machine synchronized with the production machine; changes to the production machine are automatically mirrored on the backup machine. The machine that usually plays the production role is called the primary machine. The machine that usually plays the backup role is called the secondary machine.

A role swap switches the perspective and the flow of data. If before the role swap, A was the production machine and B was the backup machine, then after the role swap B acts as the production machine and A (when available) acts as the backup machine. Ideally, users experience minimal disruption while the swap is occurring. Once the role swap is complete, users should not be able to detect the difference.

A role swap can be planned or unplanned.

A planned or deliberate role swap might occur for many reasons, including:

● As a test of the backup system and the integrity of the replication environment. ● To allow software or hardware maintenance to occur on the primary machine.

An unplanned role swap occurs when a hardware or other error on the production machine forces that machine out of service. In such a case, the backup machine is swapped into production and the former production machine will act as the backup machine and target for replication once the error is corrected and it again becomes available for use.

Preparation

There are several steps you must take in order to be prepared for a role swap. The more thorough your planning and advance preparation, the smoother your role swaps will be.

1. Ensure that all needed system information, libraries and IFS directories are being replicated. After having performed a role swap is the wrong time to discover that a needed library was not being replicated. This is the most important item in the list.

2. Synchronize configuration information from production to backup and from backup to production. When synchronizing from production to backup, we recommend synchronizing to target library name RSFCFGPRD. When synchronizing from backup to production, we recommend synchronizing to target library name RSFCFGBKP.

The Change System Sync Attributes (CHGRSFSSA) command is used to define the target library to use when synchronizing *CFG information. The Synchronize System Info (SYNCSYSRSF) command is used to perform the actual synchronization.

3. Ensure that library RSF is synchronized in both directions as follows:

❍ Do not journal the objects in RSF (specify *NONE for the Journal parameter.) Otherwise, it will be very difficult to upgrade to a future release of RSF.

❍ Synchronize to a different target library name. (Important!)

Synchronize library RSF from production to backup and from backup to production. When synchronizing from production to backup, we recommend synchronizing to target library name RSFPRD. When synchronizing from backup to production, we recommend synchronizing to target library name RSFBKP.

❍ We recommend synchronizing library RSF every hour or two, in both directions.

❍ You can use the Change Library Sync Attributes (CHGRSFSA) to omit all RSF *PGM objects from synchronization. Only non-program RSF objects need to be synchronized.

4. If you created a separate library to hold synchronization journals on the source machine, make sure the library (or libraries) also exist on the target machine. You don't need to populate the library on the target with journals and journal receivers, just make sure that it exists. Do not replicate libraries containing only journaling objects.

5. Create Synchronization Start Programs on both the production and backup machines so that your replication environment can be started consistently. See above for more details. The Synchronization Start Program on the backup machine only needs to start synchronization for configuration information and for library RSF, as described in points 2 and 3 above.

6. Create a Production to Backup role swap program. See below for more information. 7. Create a Backup to Production role swap program. See below for more information.

8. Use the Change RSF Defaults (CHGRSFDFT) command (option 80 on menu RSFHA) to set the value for the Current Replication Role (ROLE) parameter.

Create a Production-to-Backup Role Swap Program

A Production to Backup Role Swap Program contains all the steps needed to convert the production machine to the backup role. Creating your own program allows you to customize the role swap process.

Two model programs are included in RSFTOOLS. See source members SWAPTOBKP and SWAPTOBKP2 in file RSFTOOLS/

QCLSRC. To create your own program, copy one of the two programs to your own library and modify it there. Do not store your program in libraries RSF or RSFTOOLS or it may be lost the next time you upgrade RSF. For complete system integrity, you should replicate the library containing your program to the target machine.

We recommend creating a new library called RSFUSER in which to store source and objects for user-written role swap and

synchronization start programs.

The difference between SWAPTOBKP and SWAPTOBKP2 is that the former assumes IP addresses and system configuration will be swapped. The latter does not.

Note that your Production to Backup Role Swap Program will be passed no parameters.

Once you've created your program, tell RSF where to find it by using the Change RSF Defaults (CHGRSFDFT) command. Prompt the command, page down almost to the end and set the "Role swap to backup" (SWAPTOBKP) parameter.

Create a Backup-to-Production Role Swap Program

A Backup to Production Role Swap Program contains all the steps needed to convert the backup machine to the production role. Creating your own program allows you to customize the role swap process.

Two model programs are included in RSFTOOLS. See source members SWAPTOPRD and SWAPTOPRD2 in file RSFTOOLS/

In document Remote Software Facility (Page 56-66)