Caché High Availability
Guide
Version 2008.2
10 October 2008
Caché Version 2008.2 10 October 2008 Copyright © 2008 InterSystems Corporation All rights reserved.
This book was assembled and formatted in Adobe Page Description Format (PDF) using tools and information from the following sources: Sun Microsystems, RenderX, Inc., Adobe Systems, and the World Wide Web Consortium at www.w3c.org. The primary document development tools were special-purpose XML-processing applications built by InterSystems using Caché and Java.
and
Caché WEBLINK, Distributed Cache Protocol, M/SQL, N/NET, and M/PACT are registered trademarks of InterSystems Corporation.
and
InterSystems TrakCare, InterSystems Jalapeño Technology, Enterprise Cache Protocol, ECP, and InterSystems Zen are trademarks of InterSystems Corporation.
All other brand or product names used herein are trademarks or registered trademarks of their respective companies or organizations.
This document contains trade secret and confidential information which is the property of InterSystems Corporation, One Memorial Drive, Cambridge, MA 02142, or its affiliates, and is furnished for the sole purpose of the operation and maintenance of the products of InterSystems Corporation. No part of this publication is to be used for any other purpose, and this publication is not to be reproduced, copied, disclosed, transmitted, stored in a retrieval system or translated into any human or computer language, in any form, by any means, in whole or in part, without the express prior written consent of InterSystems Corporation.
The copying, use and disposition of this document and the software programs described herein is prohibited except to the limited extent set forth in the standard software license agreement(s) of InterSystems Corporation covering such programs and related documentation. InterSystems Corporation makes no representations and warranties concerning such software programs other than those set forth in such standard software license agreement(s). In addition, the liability of InterSystems Corporation for any losses or damages relating to or arising out of the use of such software programs is limited in the manner set forth in such standard software license agreement(s). THE FOREGOING IS A GENERAL SUMMARY OF THE RESTRICTIONS AND LIMITATIONS IMPOSED BY INTERSYSTEMS CORPORATION ON THE USE OF, AND LIABILITY ARISING FROM, ITS COMPUTER SOFTWARE. FOR COMPLETE INFORMATION REFERENCE SHOULD BE MADE TO THE STANDARD SOFTWARE LICENSE AGREEMENT(S) OF INTERSYSTEMS CORPORATION, COPIES OF WHICH WILL BE MADE AVAILABLE UPON REQUEST.
InterSystems Corporation disclaims responsibility for errors which may appear in this document, and it reserves the right, in its sole discretion and without notice, to make substitutions and modifications in the products and practices described in this document.
For Support questions about any InterSystems products, contact:
InterSystems Worldwide Customer Support +1 617 621-0700 Tel: +1 617 374-9391 Fax: [email protected] Email:
Table of Contents
Introduction... 1
1 Write Image Journaling and Recovery... 3
1.1 Write Image Journaling... 3
1.1.1 Image Journal... 4
1.1.2 Two-Phase Write Protocol... 4
1.2 Recovery... 5
1.2.1 Recovery Procedure... 6
1.3 Error Conditions... 7
1.3.1 If Recovery Cannot Complete (UNIX and OpenVMS)... 8
1.3.2 Sample Recovery Errors... 8
1.3.3 Write Daemon Panic Condition... 9
1.3.4 Write Daemon Errors and System Crash... 9
1.3.5 Freeze Writes on Error... 10
1.3.6 Responding to a Freeze... 11
1.4 Limitations of Write Image Journaling... 12
2 Backup and Restore... 13
2.1 Backup Integrity and Recoverability... 14
2.2 Importance of Journals... 15
2.3 Backup Methods... 16
2.3.1 External Backup... 16
2.3.2 Online Backup... 19
2.4 Configuring Caché Backup Settings... 20
2.4.1 Define Database Backup List... 21
2.4.2 Configure Backup Tasks... 21
2.4.3 Schedule Backup Tasks... 23
2.5 Managing Caché Online Backups... 23
2.5.1 Run Backup Tasks... 24
2.5.2 View Backup Status... 25
2.5.3 View Backup History... 25
2.5.4 Handle Backup Errors... 25
2.5.5 Back Up Selected Globals and Routines... 26
2.6 Restoring from a Backup... 26
2.6.1 Using the Backup History to Recreate the Database... 27
2.6.2 Suspending Database Access During a Restore... 28
2.6.3 Restoring Database Properties... 28
2.7 Caché Backup Utilities... 28
2.7.1 Perform Backup and Restore Tasks Using ^BACKUP... 29
2.7.2 Back Up Databases Using ^DBACK... 31
2.7.3 Restore Databases Using ^DBREST... 34
2.7.4 Estimate Backup Size Using ^DBSIZE... 43
2.8 Sample Backup Procedures... 46
2.8.1 External UNIX Backup Script... 46
2.8.2 UNIX Backup and Restore... 47
2.8.3 OpenVMS Backup... 49
3 Journaling... 51
3.1 Journaling Overview... 51
3.1.1 Differences Between Journaling and Write Image Journaling... 53
3.1.2 Protecting Database Integrity... 53
3.1.3 Automatic Journaling of Transactions... 53
3.1.4 Rolling Back Incomplete Transactions... 54
3.1.5 Using Temporary Globals and CACHETEMP... 54
3.1.6 Journal Management Classes and Globals... 55
3.2 Configuring Journaling... 55
3.2.1 Configure Journal Settings... 56
3.2.2 Journaling Best Practices... 57
3.3 Journaling Operation Tasks... 57
3.3.1 Start Journaling... 58
3.3.2 Stop Journaling... 58
3.3.3 Switch Journal Files... 58
3.3.4 View Journal Files... 59
3.3.5 Purge Journal Files... 59
3.3.6 Restore Journal Files... 60
3.4 Journaling Utilities... 60
3.4.1 Perform Journaling Tasks Using ^JOURNAL... 61
3.4.2 Start Journaling Using ^JRNSTART... 62
3.4.3 Stop Journaling Using ^JRNSTOP... 63
3.4.4 Switch Journal Files Using ^JRNSWTCH... 63
3.4.5 Restore Globals From Journal Files Using ^JRNRESTO... 64
3.4.6 Filter Journal Records Using ^ZJRNFILT... 65
3.4.7 Display Journal Records Using ^JRNDUMP... 67
3.4.8 Update Journal Settings Using ^JRNOPTS... 72
3.4.9 Recover from Startup Errors Using ^STURECOV... 73
3.4.10 Convert Journal Files Using ^JCONVERT and ^%JREAD... 77
3.4.11 Set Journal Markers Using ^JRNMARK... 81
3.4.13 Manage Journaling at the Process Level Using %NOJRN... 83
3.5 Journal I/O Errors... 83
3.5.1 Freeze System on Journal I/O Error Setting is No... 83
3.5.2 Freeze System on Journal I/O Error Setting is Yes... 85
3.6 Special Considerations for Journaling... 85
3.6.1 Performance... 85
3.6.2 UNIX File System Recommendations... 85
4 Shadow Journaling... 89
4.1 Shadowing Overview... 90
4.2 Configuring Shadowing... 91
4.2.1 Configuring the Source Database Server... 91
4.2.2 Configuring the Destination Shadow... 94
4.2.3 Journaling on the Destination Shadow... 98
4.3 Managing and Monitoring Shadowing... 98
4.3.1 Shadow Checkpoints... 101
4.3.2 Shadow Administration Tasks... 101
4.3.3 Shadow Operations Tasks... 104
4.4 Using the Shadow Destination for Disaster Recovery... 106
5 System Failover Strategies... 109
5.1 No Failover... 110
5.2 Cold Failover... 111
5.3 Warm Failover... 112
5.4 Hot Failover... 113
6 Caché Cluster Management... 115
6.1 Overview of Caché Clusters... 116
6.1.1 Cluster Master... 116
6.1.2 Cluster Master as Lock Server... 117
6.2 Configuring a Caché Cluster... 118
6.2.1 Multiple Network Device Configuration... 118
6.3 Managing Cluster Databases... 119
6.3.1 Creating Caché Database Files... 119
6.3.2 Mounting Databases... 120
6.4 Caché Startup... 120
6.5 Write Image Journaling and Clusters... 121
6.6 Cluster Backup... 121
6.7 System Design Issues for Clusters... 122
6.7.1 Determining Database File Availability... 123
6.8 Cluster Application Development Strategies... 123
6.9 Caché ObjectScript Language Features... 124
6.9.1 Remote Caché ObjectScript Locks... 124
6.10 DCP and UDP Networking... 125
7 Cluster Journaling... 127
7.1 Journaling on Clusters... 127
7.1.1 Cluster Journal Log... 128
7.1.2 Cluster Journal Sequence Numbers... 129
7.2 Cluster Failover... 129
7.2.1 Cluster Recovery... 130
7.2.2 Cluster Restore... 131
7.2.3 Failover Error Conditions... 133
7.3 Cluster Shadowing... 133
7.3.1 Configuring a Cluster Shadow... 135
7.3.2 Cluster Shadowing Limitations... 137
7.4 Tools and Utilities... 138
7.5 Cluster Journal Restore... 138
7.5.1 Perform a Cluster Journal Restore... 139
7.5.2 Generate a Common Journal File... 147
7.5.3 Perform a Cluster Journal Restore after a Backup Restore... 148
7.5.4 Perform a Cluster Journal Restore Based on Caché Backups... 148
7.6 Journal Dump Utility... 149
7.7 Startup Recovery Routine... 150
7.8 Setting Journal Markers on a Clustered System... 151
7.9 Cluster Journal Information Global... 151
7.10 Shadow Information Global and Utilities... 153
8 Caché Clusters on Tru64 UNIX... 157
8.1 Tru64 UNIX Caché Cluster Overview... 158
8.2 TruCluster File System Architecture... 158
8.2.1 Caché and CDSLs... 159
8.2.2 Remastering AdvFS Domains... 160
8.3 Planning a Tru64 Caché Cluster Installation... 162
8.4 Tuning a Tru64 Caché Cluster Member... 163
9 Caché and Windows Clusters... 165
9.1 Single Failover Cluster... 166
9.1.1 Setting Up a Failover Cluster... 167
9.2 Example Procedures... 170
9.2.1 Create a Cluster Group... 170
9.2.2 Create an IP Address Resource... 171
9.2.4 Install Caché... 173
9.2.5 Create a Caché Cluster Resource... 174
9.3 Multiple Failover Cluster... 177
9.3.1 Setting Up a Multiple Failover Cluster... 178
10 ECP Failover... 181
10.1 ECP Recovery... 181
10.2 ECP and Caché Clusters... 183
10.2.1 Application Server Fails... 184
10.2.2 Data Server Fails... 184
10.2.3 Network Is Interrupted... 184
10.2.4 Cluster as an ECP Database Server... 185
10.3 ECP Clusters... 186
Appendix A: Caché ECP Clusters on Red Hat Enterprise Linux... 187
A.1 Pre-installation Planning... 187
A.2 Configuring the Cluster Services for Caché... 188
A.2.1 Define the Caché Cluster Services... 189
A.2.2 Install Caché... 189
A.3 Configuring the Second Node... 190
A.4 Adding Caché to the Cluster Services... 191
A.4.1 Caché Initialization File for Linux... 192
List of Figures
Shadowing Overview... 90
Relationships of Shadow States and Permissible Actions... 100
Cold Failover Configuration... 111
Warm Failover Configuration... 113
Hot Failover Configuration... 114
Cluster Shadowing Overview... 134
Example of Tru64 Cluster Configuration... 158
Single Failover Cluster... 166
Failover Cluster with Node Failure... 167
IP Address Advanced Properties... 172
IP Address Parameter Properties... 172
Physical Disk Dependency Properties... 173
Cluster Resource General Properties... 175
Cluster Resource Dependencies Properties... 175
Cluster Resource Advanced Properties... 176
Cluster Resource Parameters Properties... 176
Multiple Failover Cluster... 177
List of Tables
Conditions Affecting Write Daemon Errors ... 7
Write Daemon Error Conditions... 11
Backup Task Descriptions... 22
Values of backup_type... 45
UNIX Backup Utilities and Commands... 47
Journal Data Record Fields Displayed by ^JRNDUMP... 69
Journal File Command Type Codes... 70
Introduction
As organizations rely more and more on computer applications, it is vital to safeguard the contents of databases. This guide explains the many mechanisms that Caché provides to maintain a highly available and reliable system. It describes strategies for recovering quickly from system failures while maintaining the integrity of your data.
Caché write image journaling technology protects against internal integrity failures due to system crashes. Caché backup and journaling systems provide rapid recovery from physical integrity failures. Logical database integrity is ensured through transaction processing, locking, and automatic rollback. In addition, there are other mechanisms available to maintain high availability including shadow journaling and various recommended failover strategies involving Caché ECP (Enterprise Cache Pro-tocol) and clustering. The networking capabilities of Caché can be customized to allow cluster failover. The following topics are addressed:
• Write Image Journaling and Recovery
• Backup and Restore
• Journaling
• Shadow Journaling
• System Failover Strategies
• Caché Cluster Management
• Cluster Journaling
• Caché Clusters on Tru64 UNIX
• Caché and Windows Clusters
• ECP Failover
This guide also contains the following appendix: • Caché ECP Clusters on Red Hat Enterprise Linux
1
Write Image Journaling and
Recovery
Caché uses write image journaling to maintain the internal integrity of your Caché database. It is the foundation of the database recovery process.
This chapter discusses the following topics: • Write Image Journaling
• Recovery
• Error Conditions
• Limitations
1.1 Write Image Journaling
Caché safeguards database updates by using a two-phase technique, write image journaling, in which updates are first written from memory to a transitional journal, CACHE.WIJ, and then to the database. If the system crashes during the second phase, the updates can be reapplied upon recovery. The following topics are covered in greater detail:
• Image Journal
1.1.1 Image Journal
The Write daemon is activated at Caché startup and creates an image journal. The Write daemon records database updates here before writing them to the Caché database.
By default, the write image journal (WIJ) is named CACHE.WIJ and resides in the system manager directory, usually install-dir/Mgr, where install-dir is the installation directory. To specify a different location for this file, use the System Management Portal:
1. Navigate to the [Home] > [Configuration] > [Journal Settings] page.
2. Enter the new location of the image journal file in the Write image journal directory box and click Save. The name must identify an existing directory on the system and may be up to 63 characters long. If you edit this setting, restart Caché to apply the change.
Important: InterSystems recommends locating the write image journal (WIJ) file on a separate disk from the database disks (those that contain the CACHE.DAT files) to reduce risk and increase performance.
On some Linux and UNIX platforms, using a raw partition may improve performance. A raw partition is a UNIX character mode special file type that allows raw access to a contiguous portion of a physical disk. To place the image journal in a raw partition:
1. Calculate the size of the partition by adding the amount of database cache, the amount of routine buffer space, plus 10 megabytes. The result is the number of bytes you need to assign to the raw partition.
2. Create a raw partition of that size. See your UNIX system documentation for details.
3. Follow the previous procedure for changing the location of the WIJ directory from the [Home] > [Configuration] > [Journal Settings] page of the System Management Portal to specify the raw partition name for the Write image journal directory setting.
CAUTION: The WIJ file should never be put on a networked disk.
1.1.2 Two-Phase Write Protocol
Caché maintains application data in databases whose structure enables fast, efficient searches and updates. A database update occurs when a Set, Kill, ZSave, or ZRemove command is issued. Generally, when an application updates data, Caché must modify a number of blocks in the database structure to reflect the change.
Due to the sequential nature of disk access, any sudden, unexpected interruption of disk or computer operation can halt the update of multiple database blocks after the first block has been written but before the last block has been updated. This incomplete update leads to an inconsistent database
structure. The consequences can be as severe as a database that is totally unusable, all data irretrievable by normal means.
The Caché write image journaling technology uses a two-phase process of writing to the database to protect against such events as follows:
• In the first phase, Caché records the changes needed to complete the update in the write image journal. Once it enters all updates to the write image journal, it sets a flag in the file and the second phase begins.
• In the second phase, the Write daemon writes the changes recorded in the write image journal to the database on disk. When this second phase completes, the Write daemon sets a flag in the write image journal to indicate it is empty.
When Caché starts, it automatically checks the write image journal and runs a recovery procedure if it detects that an abnormal shutdown occurred. When the procedure completes successfully, the internal integrity of the database is restored. Caché also runs WIJ recovery following a successful shutdown as a safety precaution to ensure that database can be safely backed up.
Caché write image journaling guarantees the order of updates. The Write daemon records all database modifications in the image journal. For example, assume that modifications A, B, and C normally occur in that order, but that only B is split over multiple blocks. All three modifications are in the image journal, and are written to the database, so all three are in the database following a failure, or none of them are.
1.2 Recovery
When Caché starts, it automatically checks the write image journal and runs a recovery procedure if it detects that an abnormal shutdown occurred. Recovery is necessary if a system crash or other major system malfunction occurs at either of the following points in the two-phase write protocol process: • Before the Write daemon has completed writing the update to the write image journal. In this case,
recovery discards the incomplete entry and updates are lost. However, the databases are in a con-sistent and usable state and the transaction journal file can be applied, if it is being used, to restore any updates which may have been lost because they had not yet been written to the database. See the Journaling chapter for more information.
• After the update to the write image journal is complete but before the database is updated. In this case, the recovery procedure applies the updates from the write image journal file to the database to restore internal database integrity.
1.2.1 Recovery Procedure
If the write image journal is marked as “complete,” the Write daemon completed writing modified disk blocks to the image journal but had not completed writing the blocks back to their respective databases. This indicates that restoration is needed. The recovery program, cwdimj, does the following: • Informs the system manager in the recovery log file.
• Performs dataset recovery.
• Continues and completes restoration.
1.2.1.1 Recovery Log File
The recovery procedure records its progress in the cconsole.log file in the Caché system manager directory. This file contains a record of output from all recoveries run in the %SYS namespace. To view the file, open it with a text viewer or editor. You can also view its contents from the [Home] > [System Logs] > [View Console Log] page of the System Management Portal.
1.2.1.2 Dataset Recovery
The recovery procedure allows you to confirm the recovery on a dataset-by-dataset basis. Normally, you specify all datasets. After each dataset prompt, type either:
• Y — to restore that dataset
• N — to reject restoration of that dataset
You can also specify a new location for the dataset if the path to it has been lost, but you can still access the dataset. Once a dataset has been recovered, it is removed from the list of datasets requiring recovery and is not recovered during subsequent runs of the cwdimj program, should any be necessary. Typically, all recovery is performed in a single run of the cwdimj program.
1.2.1.3 Completes Restoration
If no operator is present during the recovery procedure, Caché takes default actions in response to prompts: it restores all directories and automatically marks the write image journal as deleted. However, if a problem occurs during recovery, the cwdimj program aborts and the system is not started. Any datasets which were not successfully recovered are still marked as requiring recovery in the write image journal. See the Error Conditions section for more information.
When the recovery procedure is complete, the recovery program asks whether it should mark the contents of the write image journal as “deleted” . If recovery has successfully written all blocks, answer “Yes.” However, if an error occurred during writing, or if you chose not to write the blocks, answer “No;” otherwise, you most likely will cause database degradation.
Caché cannot run until either the contents of this file have been deleted or the file has been removed or renamed.
When recovery completes normally, the write image journal is marked as deleted, and startup continues. If the Write daemon cannot create the write image journal, it halts all database modifications. The halt continues until the Write daemon can create the image journal, or until you shut down the system. Once the Write daemon is able to create the image journal, it sends the following message to the console log:
Database updates have resumed
1.3 Error Conditions
If an error occurs that causes database degradation, the Write daemon’s action depends on the condition under which the error occurs.
Conditions Affecting Write Daemon Errors
Write Daemon Action Condition
Write daemon freezes the system and logs to the operator’s console an error message of the type shown in the Freeze Writes on Error section.
Database freezes on error.
Error trapping halts the process where the error occurred. One of the error conditions listed in the Write Daemon Error Conditions table is stored in the ^ERTRAP global in the Caché database, unless there is a file-full condition in that database. In that case, the halt occurs with no indication as to why.
Error trapping is enabled with the command SET
$ZT="^%ET".
One of the errors listed in the Write Daemon Error Conditions table appears on your screen.
Error occurred as a result of a Caché ObjectScript command entered in programmer mode.
Write daemon freezes the system and displays the following message: “SERIOUS DISK WRITE ERROR - WILL RETRY” . If it cannot recover, it displays a message of the type shown in the Freeze Writes on Error section. If it is able to recover, database updates resume.
Serious Disk Write error occurred in a Caché database file.
Write Daemon Action Condition
Write daemon freezes the system while it attempts to recover, and displays one of the following messages: “SERIOUS DISK ERROR WRITING IMAGE FILE - WILL RETRY” or “SERIOUS DISK ERROR READING IMAGE FILE - WILL RETRY” . If it cannot recover, it displays a message of the type shown in the Freeze Writes on Error section. If it is able to recover, database updates resume. Serious Disk Read or
Write error occurred in the write image file.
1.3.1 If Recovery Cannot Complete (UNIX and OpenVMS)
If recovery cannot complete, Caché prompts you to choose between the following two options: • Abort startup, fix the problem that prevented recovery, and try again. This option is preferable ifyou have time for it.
• Delete or rename the write image journal file and continue startup. Caché will run with one or more databases suffering degradation caused when an update in progress did not complete when the system crashed or while recovery took place. If you delete the write image journal, you must restore those databases from backups or use repair utilities to fix them.
1.3.2 Sample Recovery Errors
1.3.2.1 Error Opening CACHE.DAT
If you cannot open a cache.dat or cache.ext file that needs to be restored, you see this message during the write phase:
Can't open file: /usr/cache/cache.dat Its blocks weren't written
Recovery continues trying to write blocks to all other directories to be restored. If this happens, do the following:
1. Do not delete the write image journal.
2. Try to correct the problem with the Caché database on which the error occurred. 3. Restart and let recovery try again.
Directories that were restored the first time are not listed as having blocks to be written during this second recovery attempt.
1.3.2.2 Error Writing to Caché Block
If recovery starts to write to a Caché database file, but cannot write a particular block number, you see this message:
Error writing block number xxxx
If this error occurs four times in a single restoration, the restoration aborts, and you see this message: Error writing block number xxxx
Do you want to delete the Write Image File (Y/N)? Y =>
Enter N to retain the write image journal. Recovery attempts to continue. If it still does not succeed and you receive this message again, contact the InterSystems Worldwide Response Center (WRC). If you must continue immediately, you can delete or rename the write image journal. If you delete it, you lose all changes recorded in it.
1.3.2.3 Error Reading Write Image Journal
If an error occurs when recovery attempts to read the write image journal file, you see this message: Do you want to write them now (Y/N)? Y =>Yes
*** WRITING ABORTED***
Can't read Cache Write Image File
Do you want to delete the Write Image File (Y/N)? Y =>
1.3.3 Write Daemon Panic Condition
If the global buffer pool is full of blocks that need to be written to databases, the Write daemon may enter a state where it cannot write to its write image journal. Before this happens, it notifies you on the operator’s console and the cconsole.log file. It then prints a message for each block written to the database that was not written first to the write image journal. This technique allows you to track the cause of any subsequent database degradation.
If the condition clears because global buffers have been freed up, the Write daemon informs you on the operator’s console that the panic condition has ended. If the panic condition does not end, the system may hang. If so, running cstop automatically calls cforce, in which case you most likely have database degradation.
To avoid this situation, allocate more database cache from the System Management Portal. If a panic condition message appears on the operator’s console, try adding 1 MB to the cache.
1.3.4 Write Daemon Errors and System Crash
Caché does not allow database modifications in the event of a Write daemon error. Then, if a Write daemon error occurs while accessing any of the databases, you avoid database degradation because all updates to any database on the system are suspended.
If the system freezes, you must stop Caché and restart the system.
Under rare circumstances, database degradation can occur that cannot be rectified by write image journaling. Run Integrity on the global identified in the error message that the Write daemon logged when the freeze occurred.
1.3.5 Freeze Writes on Error
When the Write daemon encounters an error while writing a block, it freezes all processes performing database updates, and logs an error message to the operator’s console log, cconsole.log, as long as the freeze continues. It sends the error messages first at thirty second, one-, two-, and four-minute intervals, and then at regular eight-minute intervals.
If the cause of the freeze is an offline or write-protected disk, an operator can fix the problem and processing can continue. Otherwise, to recover from a freeze, you need to run:
ccontrol force
and then: ccontrol start
When the system freezes due to an error, the Write daemon generates an operator console error message that reports the exact error that caused the system to freeze as well as the name of the cache.dat file and the global or routine that was involved in causing the error. The following is an example of an error message that would occur when accessing a global:
*** CACHE: AN ERROR OCCURRED WHILE UPDATING A CACHE.DAT FILE THAT COULD CAUSE DATABASE DEGRADATION. TO PREVENT DEGRADATION ALL USER PROCESSES PERFORMING DATABASE UPDATES HAVE BEEN SUSPENDED AND THE WRITE DAEMON WILL NOT RUN. ERROR: <DATABASE>
FILE: DUA0:[SYSM] GLOBAL: ^UTILITY
If the error occurs while accessing a routine, the last part of the error message reads: ROUTINE: TESTING
The following table describes the errors that can occur during a database update, and provides some possible solutions. Not every occurrence of these errors freezes the system; the system freezes only when the error occurs in the middle of a database update.
Write Daemon Error Conditions
Solution Meaning
Error
Determine whether there is expansion room in the Caché database. If not, increase the maximum size. Otherwise, determine whether there is enough physical space on the disk.
A block could not be allocated when information was added to a database because no blocks were available.
<FILEFULL>
Check that the disk is online. If it is, run Integrity on the global where the error occurred.
During an attempt to access a block in a file, the request to the operating system failed. This failure may have occurred because the disk is offline or because the actual size of the file is less than the expected size.
<DISKHARD>
Run Integrity on the global where the error occurred. A database integrity problem has
been encountered. <DATABASE>
Stop then restart Caché. If the problem still exists, contact the WRC.
System error during database update.
<SYSTEM>
Once the problem is corrected, database updates are re-enabled.
1.3.6 Responding to a Freeze
If a freeze occurs, follow the procedure below.
1. Check the operator console to see the directory, the global or routine, and the process in which the error occurred.
2. Fix any causes of the error that you can correct easily. For example, put a disk online. 3. If updates do not resume, stop Caché.
4. Restart Caché.
5. Fix any causes of the error you could not correct earlier. For example, if the error was <FILE-FULL>, you would need to provide more physical disk space, add a volume set, or increase the maximum size of the affected Caché database.
6. Run Integrity on the global or routine directory in the database where the error occurred to verify that no degradation occurred.
Some error conditions (<DISKHARD> and <DATABASE>) indicate that database degradation may exist. If degradation exists, try the ^REPAIR utility or contact the WRC.
Certain error conditions can cause degradation that write image journaling cannot repair; see the
Limitations section.
1.4 Limitations of Write Image Journaling
While the two-phase write protocol safeguards structural database integrity, it does not prevent data loss. If the system failure occurs prior to a complete write of an update to the write image journal, Caché does not have all the information it needs to perform a complete update to disk. Hence, that data is lost.
In addition, write image journaling cannot eliminate internal database degradation in the following cases:
• A hardware malfunction on the drive that contains the temporary write image journal prevents Caché from reading this file.
Note that the Write daemon freezes if the malfunction occurs during an attempt to read or write this temporary file while Caché is operating. In most cases this means that a malfunction of this disk results only in data loss, not database degradation.
• A drive malfunctions and its contents are irretrievably lost or permanently unalterable. You must restore the backup of this database for the directories using the malfunctioning drive. However, write image journaling can still restore directories on other disks.
• A single process (for example, due to a segmentation fault) disappears while within the global module. Such a situation could occur if:
- On Windows NT, the Task Manager is used to halt a single process. - On OpenVMS or UNIX, the terminal for that process is disconnected.
- On OpenVMS, a STOP/ID is issued. See the $ZUTIL(69,24) entry in the Caché ObjectScript Reference for further details.
• If an obscure situation occurs in which drive A contains pointer blocks to drive B, a Kill command deletes those pointers, and after the Garbage Collector begins its work, drive A becomes inoperable before the pointer block is rewritten. In this situation, write image journaling could fail. This condition usually follows another failure that would prevent this situation from being a problem. Furthermore, this situation is also likely to be one in which drive A has malfunctioned to such an extent that you would need to restore the database for that drive anyway.
2
Backup and Restore
This chapter outlines the factors to consider when developing a solid plan for backing up your Caché system. It discusses techniques for ensuring the integrity and recoverability of your backups, as well as suggested backup methodologies. Later sections of the chapter contain details about the procedures used to perform these tasks, either through the System Management Portal or by using Caché and third-party utilities. It discusses the following topics:
• Backup Integrity and Recoverability
• Importance of Journals
• Backup Methods
• Configuring Caché Backup Settings
• Managing Caché Online Backups
• Restoring from a Backup
• Caché Backup Utilities
• Sample Backup Procedures
Backup strategies can differ depending upon your operating system, preferred backup utilities, disk configurations, and backup devices. If you require further information to help you to develop a backup strategy tailored for your environment, or to review your current backup practices, please contact the
2.1 Backup Integrity and Recoverability
Regardless of the backup methods you use, it is critical to restore backups on a regular basis as a way to ensure that your backup strategy is a workable means of disaster recovery. The best practice is to restore every backup of the production environment to an alternate server, and then check the physical structure of the restored databases. This provides the following backup validation functions:
• Validates the recoverability of the backup media.
• Validates the global-level integrity of the databases in the backup.
• Provides a warm copy of the backup, substantially reducing the time required to restore the backup in the event of a disaster. If such an event occurs, you need only restore the updates in the journal files.
• Establishes a last known good backup.
The backup methods described in this document preserve the physical structure of the database; therefore, a clean integrity check of the restored copy implies that the integrity of the production database was sound at the time of the backup. The converse, however, is not true; an integrity error detected on the restored copy of a database does not necessarily imply that there are integrity problems on the production database. There could, for example, be errors in the backup media. If you discover an integrity error in the restored database, immediately run an integrity check on the production database to verify the integrity of the production system.
Note: See the Check Database Integrity section of the “Managing Caché” chapter of the Caché System Administration Guide for the details of checking database integrity.
To further validate that the application is working correctly on the restored database, you can also perform application-level checks. To perform these checks, you may need to restore journal files to restore transactional integrity. See the Importance of Journals section for more information. Once you restore the backup and establish that it is a viable source of recovery, it is best to preserve that restored copy until you establish the next good backup. Therefore, the server on which you are validating the backup should ideally have twice the storage space required by production—space to store the last-known good backup as well as the backup your are currently validating. (Depending on your needs, you may have less stringent performance requirements of the storage device used for restoring backups, allowing for a less expensive storage solution.) In this way, the last-known good backup is always available for use in a disaster even if validation of the current backup fails. To protect the enterprise from a disaster that could destroy the physical plant, regularly ship backup media to a secure off-site location.
You can run backups during transaction processing; as a result, the backup file may contain partial transactions. When restoring from a backup, you first restore the backup file, then restore from the
journal files to complete the partial transactions in the backup file. Retain all journal files corresponding to the last-known backup until you identify a new backup as the last-known good backup.
2.2 Importance of Journals
The backup of a Caché database alone is not enough to provide a viable restore of production data. In the event of a disaster that requires restoring from backup, you always apply journal files to the restored copy of the database. Applying journal files restores all journaled updates from the time of the backup, up to the time of the disaster. Also, applying journals is necessary to restore the transactional integrity of your database by rolling back uncommitted transactions (the databases may have contained partial transactions at the time of the backup).
It is critical to ensure that journal files are available for restore in the event of a disaster. Take the fol-lowing steps to prevent compromising the journal files when disaster recovery requires you to restore databases.
• Verify that you are journaling all databases that require durability and recoverability.
• Do not purge a journal file unless it was closed prior to the last-known good backup, as determined by the backup validation procedure discussed previously. Set the number of days and the number of successful backups after which to keep journal files appropriately.
• Define an alternate journal directory.
• Place the primary and alternate journal directories on disk devices that are separate from the storage of the databases, separate from the storage of the write image journal (WIJ), and separate from each other (primary and alternate journal directories should reside on different devices). For practical reasons, these different devices may be different logical unit numbers (LUNs) on the same storage area network (SAN), but the general rule is: the more separation the better. As best as possible, configure the system so that journals are isolated from any failure that may compromise the databases or WIJ, because if the database or WIJ is compromised, then restoring from a backup and journal files may be required.
• Consider using hardware redundancy such as mirroring to help protect the journals. Long-distance replication can also provide a real-time off-site copy of the journal files. The off-site copy of journals allows recovery from a disaster where the physical plant is destroyed (in conjunction with the off-site copy of the backup media).
• Set the journal Freeze on error option to Yes. If a journal failure occurs where journaling can no longer write to the primary nor the alternate journal device, you can configure the system to freeze. The alternative is to allow the system to continue, which leads to journaling being disabled. This, among other things, compromises the ability to reliably restore from backups and journal files.
Important: It is critical to test the entire disaster recovery procedure from start to finish periodi-cally. This includes backup restore, journal restore, and running simulated user activity on the restored environment.
See the “Journaling” chapter of this guide for more information.
2.3 Backup Methods
The two main methods of backing up Caché data are the external backup and the Caché online backup. Each of these methods have variations on how to implement them; your backup strategy can contain multiple types of backups performed at different times and with different frequency. This section describes the details and variations of the two types of backups:
• External Backup
• Online Backup
2.3.1 External Backup
Use the external backup in conjunction with technology that provides the ability to quickly create a functional “snapshot” of a logical disk volume. Such technologies exist at various levels, such as simple disk mirrors, volume shadowing at the operating system level, or more modern snapshot tech-nologies provided at the SAN level.
This approach is especially attractive for enterprises that have a very large amount of data, where the output of a Caché online backup would be so large as to be unwieldy. The approach is to freeze writes to all database files for the duration required to create a snapshot, then create a snapshot of the disk using the technology of choice. After you create the snapshot, thaw the system to again allow writes to the database while you copy the snapshot image to the backup media.
Caché provides the Backup.General class with class methods to simplify and enhance this technique. On nonclustered instances of Caché, these class methods pause physical writes to the database during the creation of the snapshot, while allowing user processes to continue performing updates in memory. This allows for a zero-downtime external backup on nonclustered systems. Use this mechanism with a disk technology that can create the snapshot within several minutes; if you pause the Write daemon for an extended period of time, user processes could hang due to a shortage of free global buffers.
Important: On clustered instances of Caché, this method pauses user processes for the duration of the freeze.
In addition to pausing writes, the freeze method also handles switching journal files and writing a backup marker to the journal. The class methods that perform the database freeze and thaw operations
are Backup.General.ExternalFreeze() and Backup.General.ExternalThaw() respectively. On nonclustered systems if you do not journal your databases you may lose data if the system crashes while it is suspended.
There is also a Backup.General.QuiesceUpdates() class method that blocks new database update activity and waits for existing update activity to finish within a certain period of time. See the
Backup.General class documentation in the Caché Class Reference for details on the use of these methods and examples.
The following sections discuss the types of external backups and the advantages and disadvantages of each:
• Concurrent External Backup
• Paused External Backup
• Cold Backup
2.3.1.1 Concurrent External Backup
A concurrent external backup, or “dirty backup,” is the most common strategy used by large-scale production facilities that have large databases, have limited time to complete a backup, and require uninterrupted processing 24 hours a day. The utility you use to perform the backup depends on your site preference and the operating system. You may choose a native operating system utility, such as the UNIX tar utility, or a third-party utility such as Veritas or ARCserve.
• Advantages — Production is not paused (except possibly very briefly during the incremental backup).
• Disadvantages — Multiple files need to be restored (Cache.dat database files and incremental backup files), which causes the restore process to take longer.
Procedure Outline:
Perform a concurrent external backup using the following steps as a guide: 1. Clear the list of data blocks modified since the last backup.
2. Copy the Cache.dat database files.
3. Perform a Caché incremental backup, which copies any blocks that changed while the Cache.dat
files were being copied; this may cause a very brief suspension of user processes in some config-urations.
2.3.1.2 Paused External Backup
Using a paused external backup is the second most common strategy used by large-scale production facilities that have large databases and limited time to complete a backup, but can tolerate a brief sus-pension of writes to databases. Organizations often use this strategy in conjunction with advanced disk technologies, such as disk mirroring. The approach is to freeze the Caché Write daemon long enough to separate a mirror copy of data and then quickly resume writes to the databases. You back up the mirror and then later rejoin it to production.
• Advantages — An incremental pass is not necessary in the restore process.
• Disadvantages — Unless mirroring or a similar technology is used, you must freeze writes to the database for a considerable amount of time, which may cause a shortage of free global buffers. Procedure Outline:
Perform a paused external backup using the following steps as a guide:
1. Freeze writes to the database using the ExternalFreeze() method of the Backup.General class. 2. Separate the disk mirror from production (if using advanced disk technologies), or make a copy
of the Cache.dat database files.
3. Resume Caché writes using the ExternalThaw() method of the Backup.General class.
4. If you split a mirror from production, back up the mirror copy of the database and rejoin the mirror to production.
See the Paused External Backup script section for an example.
2.3.1.3 Cold Backup
You generally use the cold backup strategy when your operation tolerates downtime. Often, smaller installations that do not have strict 24/7 access requirements use this strategy. Sometimes this is done only when performing a complete system backup as part of a maintenance effort such as repairing faulty hardware. In this situation, stop Caché during the backup period and restart it when the backup completes.
• Advantages — Very simple procedure (stop Caché and copy the cache.dat files).
• Disadvantages — You must stop Caché; consequently, of all the backup options, this method involves the longest downtime.
Procedure Outline:
1. Stop Caché using the ccontrol command or through the Caché Cube. 2. Perform the backup.
2.3.2 Online Backup
Caché implements a proprietary backup mechanism designed to cause very minimal or, in most cases, no downtime to users of the production system. The online backup captures only blocks that are in use by the database. The output goes to a sequential file. The backup file is then copied to the backup media along with any other external files such as the .cpf file, the CSP files, and external files used by the application.
The Caché backup uses a multipass scan to backup database blocks. It is expected that each pass has a reduced list of modified blocks and that generally three passes are sufficient to complete a backup. During the entire final pass and for a brief moment during each prior pass, the system pauses writes to the database. If the backup list contains only new-format databases (8-KB block size), only physical writes to the database are paused while user processes are allowed to continue performing updates in memory. If the backup list contains any old-format (2-KB block size) databases, or if it is a clustered Caché environment, then all user activity is paused for these multiple brief periods.
The concurrent Caché online backup strategy is used when the backup must have the least impact on Caché processes. This is a strategy used across all sizes of production facilities.
In the case where 8-KB databases are used in a nonclustered environment, it is possible to back up the database without pausing user processes. The backup procedure incorporates multiple passes to copy the data, where each consecutive pass copies any data blocks that changed during the previous pass. During the last pass, writes to the disk are paused, while writes to the buffers are still allowed, thus users are not impacted (provided there are sufficient global buffers). In a clustered environment (or when some 2-KB databases are backed up), user processes are paused briefly during the final pass of the backup.
There are three different types of concurrent online backups: full, cumulative, and incremental, which can be combined to manage a trade-off between the size of the backup output, and the time needed to recover from the backup:
Full Backup
Writes an image of all in-use blocks to the backup media.
• Advantages — Provides the basis of your database restoration; a requirement for cumulative and incremental backups.
• Disadvantages — Time-consuming operation.
Cumulative Backup
Writes all blocks that have been modified since the last full backup. Must be used in conjunction with a previous full backup.
• Advantages — Quicker than a full backup; quicker to restore than multiple incremental backups. • Disadvantages — More time-consuming than incremental backups.
Incremental Backup
Writes all blocks that have been modified since the last backup of any type. Must be used in conjunction with a previous full backup and (optionally) subsequent cumulative or incremental backups.
• Advantages — Quickest backup; creates smallest backup files.
• Disadvantages — May end up having to restore multiple incremental backups, slowing down the restore process.
Caché online backup writes all database blocks to a single file (or set of tapes) in an interleaved fashion. When an extremely large amount of data is backed up using online backup, restores can become somewhat cumbersome. This should be considered when planning your backup strategy. The restore validation process discussed above helps resolve limitations in this area by providing an online, restored copy of the databases.
When using incremental or cumulative backup, the same backup validation method explained earlier in this document should of course be used. After each incremental or cumulative backup is performed, it can be immediately restored to the alternate server. As an example, a strategy of weekly full backups and daily incremental backups can work well because each daily backup only contains blocks modified that day. Each day restore that incremental to the alternate server, and check integrity.
As discussed previously, overwriting the warm copy of the last known good backup when restoring the backup currently being validated should be avoided. The same concept applies when restoring an incremental to the existing restored database. After the backup is established as being the last known good backup and before applying the next day’s incremental or cumulative backup to it, a copy should be saved so that the last known good backup is always online and ready for use in case the subsequent incremental restore fails. If a restored backup fails an integrity check, it must be discarded and cannot be used as a target of a subsequent incremental restore.
When restoring a system from a Caché backup, first restore the most recent full backup, followed by the most recent cumulative backup, and then all incremental backups taken since the cumulative backup.
2.4 Configuring Caché Backup Settings
You can configure the Caché database backup settings from the [Home] > [Configuration] > [Database Backup Settings] and the [Home] > [Task Manager] pages of the System Management Portal.
From the System Management portal you can perform the following configuration tasks: • Define Database Backup List
• Configure Backup Tasks
2.4.1 Define Database Backup List
Caché maintains a database list that specifies the databases to be backed up. You can display this list by opening the [Home] > [Configuration] > [Database Backup Settings] > [Backup Database List] page of the System Management Portal.
Use the arrow buttons to move the databases you do not want to back up to the Available list and the databases you do want to back up to the Selected list. Click Save.
When you add a new database to your system, Caché automatically adds it to the database list. If you do not need to include the new database in your backup plan, be sure to remove it from the Backup Database List.
This database list is ignored by the FullAllDatabases backup task, which performs a backup of all databases excluding the CACHETEMP, CACHELIB, DOCBOOK, and SAMPLES databases. If you update the Caché-supplied CACHELIB and DOCBOOK databases, you can add them to the database list and run a FullDBList backup as a base for subsequent backup tasks.
You can also maintain the backup database list using the Backup.General.AddDatabaseToList() and Backup.General.RemoveDatabaseFromList() methods. See the Backup.General class description in the Caché Class Reference for details on using these methods.
2.4.2 Configure Backup Tasks
Caché provides four different types of backup tasks; each is listed as an item on the Database Backup Settings menu. The four backup tasks are:
• Configure Full Backup of All Databases • Configure Full Backup of the Database List • Configure Incremental Backup of the Database List • Configure Cumulative Backup of the Database List
These are predefined backup tasks that an operator can run on-demand from the [Home] > [Backup] page of the portal. You can also schedule combinations of these backup tasks using the Task Manager. See the Schedule Backup Tasks section later in this chapter for details.
The process for configuring each of these tasks is the same. The Name, Description, and Type fields are read-only and reflect the menu choice as described in the following table.
Backup Task Descriptions
Type Description
Name
Full Full backup of all commonly updated databases,
whether or not they are in the Backup Database List. FullAllDatabases
Full Full backup of the Caché databases listed in the
Backup Database List. FullDBList
Incremental Incremental backup of changes made to the data
since the last backup, whether full or cumulative. Backup is performed on the databases currently listed in the Backup Database List.
IncrementalDBList
Cumulative Cumulative and Incremental backup of all changes
made to the data since the last full backup. Backup is performed on the databases currently listed in the Backup Database List.
CumuIncrDBList
You can send backup output to a directory on disk or to magnetic tape. Select one of the two options: 1. To back up to a directory on disk, specify the file pathname in the Device field. Click Browse to
select a directory.
2. To back up to magnetic tape, select the Save to Tape check box, and specify a Tape Number from the list of available tape device numbers.
See the Identifying Devices section of the Caché I/O Device Guide for detailed information regarding tape numbers.
The Define Database Backup List section describes how to maintain the Backup Database List.
2.4.2.1 Backup File Names
By default, backup files are stored in CacheSys\Mgr\Backup. The backup log files are stored in the same directory. Backup files have the suffix .cbk. Backup log files have the suffix .log.
Backup files and backup log files use the same naming conventions: • The name of the backup task, followed by an underscore character (_)
• The date of the backup, in yyyymmdd format, followed by an underscore character (_) • An incremental number, nnn, for that task, for that day
Where nnn is a sequence number incremented for that backup task on that date. Caché creates a log file for every backup attempt; successful, failed, or aborted. Caché creates a backup file only upon successful backup, but its increment number matches the corresponding log file increment number. For example: You perform three FullDBList backup operations on June 4, 2006, the first successful, the second aborted, the third successful. This generates three .log files, numbered 001, 002, and 003, but only two .cbk files, numbered 001 and 003.
The backup files:
FullDBList_20060604_001.cbk FullDBList_20060604_003.cbk The matching log files:
FullDBList_20060604_001.log FullDBList_20060604_002.log FullDBList_20060604_003.log
2.4.3 Schedule Backup Tasks
You should ideally set up a schedule for running backups. Backups are best run at a time when there are the least amount of active users on the system.
In addition to the four backup tasks supplied with Caché, you can create additional definitions of these four backup tasks. For example, you could create two full backup tasks, one to save the backup to a disk file, the other to save the backup to a tape. Or, to alternate backups between two disk drives, you could create a backup task for each drive.
Use the Caché Task Manager to schedule these backup tasks:
1. Navigate to the [Home] > [Task Manager] page of the System Management Portal. 2. Click Schedule New Task.
3. Specify the Name, Description, Backup Type, and output location.
You can delete any task you add by clicking Delete on its row on the Task Schedule page.
2.5 Managing Caché Online Backups
You can run Caché database backup tasks and view backup history from the [Home] > [Backup] page of the System Management Portal. If you schedule additional backup tasks using the Task Manager, you can manage those from the [Home] > [Task Manager] page of the System Management Portal. From the System Management portal you can perform the following backup tasks:
• Run Backup Tasks
• View Backup Status
• View Backup History
• Handle Backup Errors
• Back Up Selected Globals and Routines
When you add a new database to your system, you must perform a full backup. You cannot perform an incremental backup, or restore a database, until a full backup exists.
After installing Caché, InterSystems recommends that you perform a FullAllDatabases backup to establish a complete backup for subsequent use by the other backup tasks.
2.5.1 Run Backup Tasks
There are four types of backup tasks you can run from the System Management Portal, each having its own menu item:
• Run Full Backup of All Databases
• Run Full Backup of the Backup Database List • Run Incremental Backup of the Backup Database List • Run Cumulative Backup of the Backup Database List
You must have performed a full backup on a database before performing an incremental or cumulative backup on that database.
Read the Run Backup Task box to verify that the settings are correct. If the backup options are correct, click OK to start the backup.
While running a backup from the [Home] > [Backup] > [Run Backup] page, you can view the status of the running backup by clicking the text next to Backup started. See Monitor Backup Status for details.
Performing Multivolume Backups
A backup, particularly a full backup, may require multiple tape volumes, or multiple disk files. Currently, there is no way to perform a multivolume backup using the System Management Portal. If you require a multivolume backup use the ^BACKUP utility. If a disk full condition occurs, Caché prompts you for the name of another disk file on another disk.
In the event of an error during backup, you cannot restart the backup on a second or subsequent volume. You must restart the backup from the beginning.
2.5.2 View Backup Status
Click View on the running backup process to monitor the progress of the backup operation. The same information is recorded in the log file for that backup operation, which you can later view from the View Backup History page.
When Caché begins a backup, it updates the Time and Status columns of the listing. The Time column records the date and time that the backup was initiated, not when it completed. The Status column is updated to Running.
Upon completion of a backup, Caché again updates the Status column to indicate the final status of the backup. Completed indicates the backup successfully completed. Failed indicates the backup could not be performed or was aborted by the operator.
One cause of backup failure is trying to perform a backup on a dismounted database.
2.5.3 View Backup History
Every backup operation creates a separate backup log file. The logs follow the naming convention described in Backup File Names.
From the portal you can view a list of system backup logs from completed backup tasks: 1. Navigate to the [Home] > [Backup] page of the System Management Portal.
2. Click View Backup History in the right-hand column to display the [Home] > [Backup] > [View Backup History] page.
3. To view the contents of a particular file, click View in the right-hand column of the appropriate row. You can view the contents of a backup log file and search for a string within that file.
2.5.4 Handle Backup Errors
In the event of an error during backup, the backup utility allows you to retry the device on which the error occurred. Alternatively, you can abort the backup.
On stand-alone systems (those not clustered), if you abort a backup regardless of the type, the next backup must be a full backup. This full backup on a stand-alone system does not block access to the database.
If a backup encounters any I/O errors, the backup aborts and logs a system error in the cconsole.log
file, viewable by the Backup utility, the SYSLOG character-based utility, or any text file viewer. The log file allows you to quickly see where the problem occurred so that you can fix it.
2.5.5 Back Up Selected Globals and Routines
You may want to back up only selected globals or routines in a database. The System Management Portal offers options to perform these tasks. The following are a few cases of where these options are helpful:
• Restoring selected globals — If you back up your globals using the Export option from the [Home] > [Globals] page of the System Management Portal, you can use the Import option to restore only the globals you require.
• Restoring a database after extensive repairs — When your Caché database suffers degradation, it does not use the space as efficiently as it could; some unused blocks are not marked as being available, and pointers become overly indirect. If you backed up your globals using the Export option before the problem occurred, you can recreate the database, and then load the globals using the Import option.
• Restoring selected routines — You can use the Export and Import options from the [Home] > [Routines] page of the System Management Portal to back up and restore individual routines. You can also selectively back up and restore source code (.MAC, .INC, and .INT files), or both source and object code (.OBJ files) using the Export and ImportDir methods of the %SYSTEM.OBJ class. Routines and globals are backed up into standard format files. These files are referred to as RSA
(routine save) and GSA (global save) files.
2.6 Restoring from a Backup
If any problem arises that renders your data inaccessible or unusable, you can recreate that data by restoring the affected database(s) from backup files and applying the changes recorded in the journal files.
When you are restoring an incremental or cumulative backup, the target database must be in exactly the same state as when you restored the previous full backup. You must prevent all updates to the database that you restored from the full backup until you restore all subsequent incremental and cumulative backups.
Failing to heed this warning can result in the incremental/cumulative restore producing a degraded database.
If the previous full backup was an external backup, and the external backup is restored before Caché is started (started to apply the incremental) then there is the possibility that updates during startup could modify the restored full, thus invalidating the incremental that needs to be restored. Special care must be taken to avoid this. If the external is to be restored in place, then Caché must be started and
those databases dismounted before the external backup is restored. The databases can then be mounted with switch 10 set for the purpose of restoring the incremental. Alternatively, you can restore the external to alternate directory paths, restore the subsequent incremental backups, and then move them into place.
To perform a restore, use the following strategy: 1. Identify which Caché databases require restoration. 2. Restore the last full backup of those Caché databases.
3. If you have done cumulative incremental backups since the full backup, restore the last one. 4. Restore all incremental backups done since the last full backup in the order in which the backups
were performed, or restore the last cumulative incremental backup, whichever was more recent. 5. Apply the changes in the journal file for the directories restored or selected directories and globals
you specify.
6. Perform a full backup of the restored system.
CAUTION: If you backed up with a UNIX or OpenVMS backup utility, use the same utility to restore.
2.6.1 Using the Backup History to Recreate the Database
The Backup utility maintains a backup history. The Restore utility prompts you for the backup(s) to restore according to their order in the backup history.
Note: On Caché platforms that support access to the same database from multiple computers, you should always back up a given directory from the same computer, so that its complete backup history is available if you need to restore the directory.
When you select one of the three restore options on the BACKUP main menu, the utility asks you to enter the name of the device holding the first backup to be restored. The default the first time you enter a restore option is the device the last full backup was sent to, if there was one.
Caché helps you restore backups in logical order. After restoring the last full backup, the utility uses the backups in the Backup History to suggest the next logical backup for you to restore. It cycles through all of the backups in this way.
Having already prompted you with the last full backup, it prompts you to restore subsequent backups in the following order:
1. It prompts you for the most recent cumulative incremental backup after the last full backup, if one exists.
2. After restoring the most recent cumulative incremental backup, if there was one, it prompts you to restore all incremental backups since the last cumulative incremental backup (or if none exists, since the last full backup). It does so in order from the first to the most recent.
You can override the suggested backups in the restore process. Remember, however, that an incremental or cumulative incremental backup does not represent a complete copy of your disk. You can restore an incremental backup only after restoring a full backup.
2.6.2 Suspending Database Access During a Restore
In most cases, the database you are restoring is not fully independent of the other databases on the system. For this reason, it is recommended that all user activity be suspended during restore. Even if you are the only user on your system, you still want to restrict login access if any users can log in remotely to your system.
You can, however, restore a database with users active on other databases. All databases being restored are dismounted during the restore. Therefore, if you did not suspend database access, users who try to access the databases being restored receive <PROTECT> errors.
2.6.3 Restoring Database Properties
If the characteristics of a directory have changed by the time you do a restore, the restore utility handles these situations. It creates Caché databases as necessary and modifies their characteristics as appropriate, to return them to the state they were in at the time the backup was completed.
2.6.4 Error Handling for Restore
If an error occurs while you are restoring, you are given these options: • Retry the device
• Skip that block or set of blocks and continue with the restore
• Abort the restore of that directory but otherwise continue with the restore • Abort the restore
2.7 Caché Backup Utilities
Caché provides utilities to perform backup and restore tasks. The ^BACKUP routine provides menu choices to run common backup procedures, which you can also run independently and in some cases
non-interactively using scripts. The utility names are case-sensitive. Run all these utilities from the %SYS namespace.
• Perform Backup and Restore Tasks Using ^BACKUP
• Backup Databases Using ^DBACK
• Restore Databases Using ^DBREST
• Estimate Backup Size Using ^DBSIZE
2.7.1 Perform Backup and Restore Tasks Using ^BACKUP
The Caché ^BACKUP utility allows you to perform Caché backup and restore tasks from a central menu as shown in the following example:
%SYS>Do ^BACKUP
1) Backup 2) Restore ALL
3) Restore Selected or Renamed Directories 4) Edit/Display List of Directories for Backups Option?
Enter the appropriate menu number option to start the corresponding routine. Press Enter without entering an option number to exit the utility.
Subsequent sections in this document describe the utilities started by choosing each option: 1. Backup Databases Using ^DBACK
2. Restore All Databases Using ^DBREST
3. Restore Selected Databases Using ^DBREST
4. Maintain Database Backup List
2.7.1.1 Maintain Database Backup List
Note: When editing the database list use the database name, not the directory name. This is consistent with the way the backup configuration works in the System Management Portal.
The Caché ^BACKUP utility allows you to backup Caché databases or to restore an already created backup. If a list of databases has not been created then all databases will be included in the backup. If a list is created that list will apply to all aspects of the backup system including calls to