The Data Integrity Imperative
If it isn’t accurate, it isn’t available.
Technical White Paper
Introduction
The fundamental requirement of high availability software is to ensure that critical data and applications are available wherever and whenever they are needed. That typically involves replicating business data, applications and system values to a backup server and providing the ability to switchover quickly, and possibly even automatically, to the backup server when necessary. However, one factor is often
overlooked: the integrity of the backup data. The data on the backup server must be an exact replica of the production data. If errors in the replication process or problems on the backup server introduce data errors, the backup server may not be able to function when called upon or, possibly even worse, the company may suffer considerable loss if it unwittingly runs its business based on faulty data. Validating the integrity of the data on a backup server and correcting it when necessary, should, therefore, be core functions in a high availability solution.
In a never-down production environment, the integrity validation and correction functions must exhibit three core characteristics:
(1) They must be capable of confirming that the primary and backup data are bit-for-bit identical. Simply checking file attributes or merely checking at the record level is not sufficient.
(2) Data integrity checking must be completed “while active,” meaning that it must be done while users are accessing and/or updating the data, without the need to take any systems or data off-line. (3) The integrity of critical business data must be actively managed and optimized, complete with
audit trails, throughout its entire life-cycle
When questions of integrity are considered, companies often focus primarily on the integrity of business data. Clearly, this is crucial, but it is only one aspect of the integrity issue. Business data is useful only if the applications that access, update, analyze and manipulate are functioning properly. That can happen only if, in addition to the business data, the program code and related system data, such as user IDs and passwords, are also uncorrupted. Thus, the integrity of all system objects, not just business data, must be protected in a high availability solution.
The Vision Integrity Solution
Vision established the benchmark for data integrity in the high availability industry when, more than five years ago, it incorporated the use of “while active” Cyclic Redundancy Check (CRC) technology into its data synchronization checking facility to ensure bit-for-bit integrity. In 2005, Vision Solutions again led the industry with a complete data life-cycle management solution known as Director Suite that enables businesses to optimize and manage their iSeries data environment, while active, from cradle to grave, with audit trails. Currently, this capability is a Vision exclusive.
CRC is the most widely used technology for ensuring the bit-for-bit integrity of data that is being copied from one location to another. IBM started using CRC in its disk drive technology in 1954 to ensure the reliability of data written from memory to DASD. CRC is still used today by networking technology companies to ensure the integrity of the most sensitive data transmitted across networks around the globe. A CRC algorithm generates a binary number used to represent a long string of data. This allows processes to compare large data strings on different systems without transmitting the entire string. It provides the most efficient and reliable method for detecting differences in data strings, including differences in order of the characters. Communication programs commonly use CRC checks to validate the accuracy of data
sent and received, but CRC is also applicable when validating the integrity of data on your production and backup systems.
While basic CRC algorithms are relatively simple to implement in a static environment, implementing them in a high availability environment, where the CRC must often be performed on terra-byte sized databases while they are in use, is far from trivial. In fact, Vision is, to-date, the only high availability vendor to successfully implement CRC technology to ensure bit-for-bit integrity for a software-based mirroring solution.
Vision continues to lead the iSeries industry in the area of data integrity with the most sophisticated, robust set of capabilities including:
1. Database Integrity – Vision’s ORION uses multiple techniques to automatically detect and repair database integrity issues at the record level, while users and applications are actively using the database.
2. Object Integrity - ORION ensures the validity of all system objects using object-level integrity checking techniques that provide the flexibility needed to re-synchronize at the individual object, library, or link level, including the ability to automatically detect and mirror to the backup system any new objects created on the primary system.
3. Backup System Integrity – If the backup system crashes, leaving it in a corrupted state, ORION can automatically recover and restore complete database and object integrity on the backup system.
4. Data Optimization and Life-Cycle Management – Protecting the integrity of your data starts with having an organized and optimized environment so you can accurately keep track of all business critical data and manage it appropriately through its entire life-cycle, with appropriate audit trails. In addition, ORION provides facilities to optimize data before even starting the high availability processes.
Database Integrity
Vision provides four important capabilities related to ensuring database integrity: • CRC Synch Check and Repair While Active
• Sample Synch Check
• Support for Database Constraints • CRC for IFS Stream Files
CRC Synch Check While Active – Not only does Vision’s CRC synch check utility provide the highest level of data integrity, ensuring bit-for-bit identical replication on source and target, but it also delivers the following three capabilities that are needed for industrial-strength enterprise-level applications.
• CRC While Active – CRC Sync Check and Object Repair is especially useful for extremely large files with millions of records because, using this facility, re-synchronizing does not require a SAVE/RESTORE operation or an electronic send of the entire object. The CRC validation and the object repair processes can occur while the file continues to be updated by user applications on the source system.
• Automatic Object Repair While Active – This facility automatically repairs data integrity issues discovered by CRC. And, to ensure optimal performance and allow while-active capabilities, it only re-synchronizes the file segment that is actually in error.
• Parallel Mode – ORION can run multiple CRC synch check jobs in parallel to increase performance in enterprise environments where large numbers of objects need to be verified. When enabled in OMS/400, the CRC Sync Check mode divides files into a configurable number of segments and then performs CRC checks against each segment. If any segment fails the CRC check, the object will remain in a *FIX status.
Sample Synch Check can operate in either block or random mode, as described below:
• Block Mode - When in this mode, OMS/400 simultaneously checks the source and target objects for synchronization as follows:
o All significant member-level file attributes of the two objects are checked. These attributes include the following:
Number of members
Number of active and deleted records File member id
o A configurable number of the last physical records in the source file (by default 10, but you can specify up to 99,999) are checked and compared with the target file. The two sets of records must be identical.
The file is considered to be synchronized if each item at every level is found to be identical in the source and target objects. Conversely, the failure of any single condition causes OMS/400 to report an out-of-sync condition and place the object on hold (status *HLD) unless the SYNCHKHLD data area on the target is configured to log an out-of-sync message for the object rather than place it on hold.
• Random Mode – The random sample read technique uses the same functionality as block mode, but it acts on randomly selected records instead of only a group of records at the end of a file. As with block mode, you can specify that up to 99,999 are to be checked.
Additionally, a new technique—called marking—is used to determine if specific file contents are currently being updated, thus removing the need to allocate the complete file on the source system. As a result, marking provides more flexibility in terms of when and across which files sync checks can be run. Based on parameters set for the sync check, the markers are simply journal entries that indicate that the member attributes within the markers are about to be retrieved. Then, another
entry is sent containing the member attributes. The following provides more specific information about this technique:
o The marker technique will be used to determine if the member attributes can be validated. If no records have been added or deleted between the marker and the member attributes, then the number of records and deleted records can be validated (i.e., sync checked). o The facility confirms that records deleted from the source are also deleted on the target. If
not, and there are no pending failed relational integrity transactions for that file, the file is placed on hold.
o Files that are allocated exclusively will not be able to be validated by the sync check. These locks may, for example, occur during the following operations:
Save/Restore, CLRPFM, RGZPFM, ALCOBJ *EXCL.
o Files that have records added or deleted between the two markers will not have their record counts validated.
Constraints – Working with related physical files
ORION supports the sending of files with referential constraints from a source system to a target system. This feature provides the following:
• The user can resynchronize files with constraints using an Electronic Send operation.
• Because the ODS interface and ORION’s Production Library Monitor (PLM) use Electronic Send to automatically configure new files to OMS, this functionality is supported for files that have constraints. This is particularly beneficial when you have very dynamic applications with many delete and create actions occurring in files with constraints. A sync check operation detects when source and target files with constraints are out-of-sync and places the files on hold (*HLD). These individual held (*HLD) files can be resynchronized by returning them to a *PND (pending) status and then sending them electronically to the target. With ORION, using an Electronic Send for files with constraints is no different than for data objects where referential integrity does not apply. The Electronic Send operation for files with constraints deletes the target file and replaces it to ensure that constraints on the target object remain identical to those on the source object.
CRC Syncheck Feature for IFS
This feature enables an efficient byte-for-byte comparison between a source and target system (using CRC) for all data in an IFS object, while the object remains active for update.
IFS Authority Repair Feature for *ATTR Sync Check
The IFS attribute sync check repair feature checks the attributes and authorities of *STMF objects and can automatically repair attributes and authorities on the target. By default, this feature is activated, but it is an optional feature that can be deactivated by setting the IFSAUTRPR control value in the MRCTLVP
control file. When activated, this functionality will automatically repair attributes and authorities on the target system after completion of an IFS attribute (*ATTR) sync check.
Object Integrity
Production Library Monitor (PLM) - The PLM analyzes the libraries containing the objects you are mirroring and validates the status of mirrored objects. The PLM, which runs daily, searches for objects that are not being mirrored. If it finds any, it reports the nature of the problem. The PLM can also simplify the ongoing maintenance of OMS/400 by detecting, defining, journaling, and synchronizing newly created objects. In addition, it can also be configured to automatically re-send held objects. The PLM also provides an easy way to initially identifying objects to OMS/400 for mirroring.
The PLM is configured on the source and a synchronized copy of your PLM definitions is automatically maintained on the target system. Thus, in the event of a role swap, your PLM configuration on the new source system will already be in place.
Attribute Synch Check - The ODS/400 synchronization check compares the source and target attributes of an object and can be configured to mirror the source object if the two versions differ or if the object does not exist on the target.
An object level sync check can be executed as a command or as a scheduled job to be run at a time you define.
Database Relations - Object synch check for database relations verifies a logical file’s relation to a physical file at the member level.
Reverse Synch Check – The object synch check capability can be run in “reverse mode” which compares objects on the target system to source (as opposed to comparing the source to the target). The benefit of this is that extraneous objects on the target can be identified and dealt with as appropriate.
Electronic Send with Referential Constraints – Depending on the size of the object and the extent of the repairs necessary it may be more efficient to send the entire object in order to correct an error. In these circumstances, Vision’s Electronic Send utility maintains the object’s referential constraints on the target machine.
Target Recovery Mode
Enterprise-level high availability solutions must be able to maintain the integrity of the backup database. This can be a challenge when an unplanned outage on the backup system leaves the databases in a corrupted state. To handle these situations, ORION detects any type of abnormal end of the target system and automatically re-starts in recovery mode so that the software knows where to restart the apply process. It then looks for out-of-synch conditions that need to be repaired.
Optimized Data Life-Cycle Management
To protect data integrity, while ensuring the performance of your systems, your high availability solution must manage and optimize data throughout its entire life-cycle. Vision’s unique Director, a highly
integrated set of tools for systems management and optimization, is designed to proactively and automatically manage an iSeries system with a minimum of human intervention.
Because human error is one of the greatest threats to data integrity, a primary goal of Director is to
minimize human intervention while maximizing the detection and resolution of any issues that might arise. To this end, Director was built to an exacting model of autonomics, supporting all 8 pillars of autonomic computing: aware, configuring, optimizing, healing, protecting, adapting, self-managing and self-anticipating. Director, combined with Vision’s Data Manager (described below), delivers a self-installing engine for iSeries optimization that configures, monitors, and manages itself to provide measurably improved iSeries performance, resulting in significantly reduced disk I/O, improved DASD and CPU utilization, thereby delivering lower cost of ownership and higher ROI.
Using Director, system optimization precedes the initiation of the high availability processes and then continues on an ongoing basis. Director’s Re-Organize in Place feature allows you to free-up unused disk space. In addition to reclaiming otherwise lost space, Director also allows you to improve application performance by reducing excessive disk I/O and memory utilization caused by the physical presence of logically deleted records.
Data Manager is a user-based tool within the highly integrated Vision Solutions OS Suite of tools that automates the modeling, testing, purging and archiving activities for IBM iSeries application
environments.
Built on advanced technology that leverages a highly autonomic architecture, Data Manager delivers a self installing engine which provides a simple query-like interface to manage the data purge and archive process from the identification of redundant and unused data through to the deletion and/or archive of data. The toolset enables periodic and incremental archiving and, as with all Vision solutions, performs its functions while users are active on the system.