High availability example 3: Replication - Content Manager HA strategies and options

Chapter 4. High availability strategies and options

4.2 Content Manager HA strategies and options

4.2.5 High availability example 3: Replication

The next pattern, example 3 shown in Figure 4-12, is similar to example 2 discussed in 4.2.4, “High availability example 2: Clustering and replication” on page 110, except it uses database replication for the Library Server component versus a fail-over strategy. This coupled WebSphere application clustering for the eClient or custom application and Resource Manager replication for the content delivers a level 4 high availability strategy (data redundancy), as discussed earlier in 4.1.6, “Levels of availability” on page 96.

Figure 4-12 High availability: Example 3, Content Manager and DB2 replication

The difference with this example compared to example 2 is a Library Server (database) replication versus a fail-over (shared disk) implementation.

High Availability Example #3

Single Site w/ eClient (mid-tier), CM and DB2 “Replication”

Tier 3 Application Application Server (1) Application Server (2) Thin Clients Tier 1 Client WIN32 Clients Tier 2 Presentation Network Dispatcher (1) HTTP Server (1) Network Dispatcher (2) HTTP Server (2) Failover eClient / Mid- tier Requires

WebSphere Application Server 4.x AE or WebSphere Application Server 5.x ND

Resource Manager (1) Replication Tier 4 Data Library Server (1) Library Server (Replica) Resource Manager (Replica) TSM Server Failover Replication (DMZ) Failover

Database replication

There are several strategies and techniques that can be used for database replication. It is the intent of this section to highlight the different techniques that could be used in a Content Manager implementation. Additional information is available for DB2 and Oracle replication strategies that should be carefully researched and understood if this option is selected as the appropriate high availability strategy to pursue. There are four primary techniques that could be used to replicate the Library Server DB2 database, all with the goal of replicating or mirroring DB2 database content from a primary server to standby server:

Log shipping: Log shipping is a method where transaction logs are

automatically shipped from a primary DB2 server and made accessible on a standby server. After the log files are located on the standby server, it can stay relatively synchronized with the primary server. All database changes (inserts, updates, or deletes) are recorded in the DB2 transaction logs. Transaction logs are primarily used for crash recovery and to restore a system after a failure, but they can also play a role in replication through log shipping. This requires a secondary system ready to take over in the event the primary system fails. The administrator creates this secondary system by restoring a backup of the primary system database. The backup system would be placed in a rollforward pending state by restoring a database backup of the primary system. Transaction logs are then shipped from the primary system to the secondary system and used to roll the database forward through the transactions. In the event of a failure, the administrator stops the process of rolling forward, and the database is brought online.

In addition, the process of failover is typically not automated. After a failure, a system administrator must make a change at the application layer or DNS layer so that users can work against the secondary system, or make changes at the secondary system so that it can mimic the primary. This allows the application to continue to work as before with no necessary application coding changes.

Log mirroring: DB2 has the ability to perform dual logging. This method to

replicate a DB2 database is to exploit this dual logging capability of DB2. Similar to log shipping, but when this feature is used, DB2 writes the same log record to two locations simultaneously, thus producing a mirrored log environment and ensuring no loss of data. One of these servers or locations is typically a remotely mounted file system. This allows the database to create

mirrored log files on different volumes or different systems, thereby

increasing redundancy. This method does not create two active systems because the backup system is in an unusable state until it is brought out of rollforward pending state by the administrator similar to log shipping. The

downside of this approach is the performance cost associated with performing two disks writes, one of which might be remote.

Once again, the process to switch to the secondary server is typically not automated and will need a system administrator to make a change at the application or DNS layer so that users can work against the secondary server.

Database replication: DB2 includes integrated replication capabilities. The

DB2 implementation of replication consists of two pieces: Capture and Apply. The replication administrator designates replication sources from tables to be replicated, and then creates replication subscriptions on a target database, the secondary system, using the replication sources from the previous step as its source. The Capture process monitors the transaction logs for all changes to the replication source tables, placing any changes made to these tables into staging tables. The Apply program reads the staging tables and moves the changes to the subscription target on a timed interval.

As in the case of log shipping and log mirroring, data replication is an asynchronous process. For a period of time, either while the changed data has not yet been placed in the staging tables, or while the Apply program has not replicated the changes to the target system, the two databases are out of sync. This is not necessarily a long period of time or a large amount of data, but it must be considered a possibility.

Replication captures changes to the source tables; however, it does not capture changes to the system catalogs. For example, changes to table permissions will have to be performed on both systems, because replication is unable to replicate this change. In addition, the process of failover is not automated.

The process to switch to the secondary server is typically not automated and will need a system administrator to make a change at the application or DNS layer so that users can work against the secondary server.

There is some overhead in running replication. The amount of extra work depends on the amount of insert, update, and delete activity on the source tables. No extra locking is required on the base tables, because replication only analyzes the log files and not the base tables. But the population of the staging tables (change tables) and the logging of these transactions require database resources.

Enterprise storage disk mirroring: Storage products such as the IBM

Enterprise Storage Server® (ESS) using a protocol, PPRC, can allow real-time mirroring or asynchronous mirroring of data from one storage subsystem to another. The secondary ESS or storage subsystem can be located in the same site or at another site some distance away. Protocols such as PPRC are application independent. Because the copying function

database tables, and log files are up-to-date by ensuring that the primary copy will be written only if the primary system receives acknowledgement that the secondary copy has been written in a synchronous mode. These

enterprise storage subsystems also provide similar guarantees in an

asynchronous mode but with replication delays. You would typically find an

enterprise storage disk mirroring strategy used more for disaster recovery

purpose versus a high availability option. It is mentioned here for

completeness and, in some cases, can be a viable option, but it is also the more expensive option of the four discussed here.

Table 4-4 summarizes the advantages and disadvantages of the DB2 replication options.

Table 4-4 DB2 replication options advantages and disadvantages

Method Advantages Disadvantages

Log shipping Minimal impact to production system.

Low cost.

Transaction loss is a possibility (async process).

Database is in unusable state until it is brought out of rollforward pending state.

Standby database needs to be logically identical.

Log mirroring No transaction loss.

Minimal impact to production system.

Low cost.

Performance cost associated with performing two disks writes, one of which might be remote.

Database is in unusable state until it is brought out of rollforward pending state.

Standby database needs to be logically identical.

Database replication

Standby database is in a usable state.

Instant failover (but without most recent transactions).

Standby need not be physically and logically identical.

Can choose to replicate only critical tables.

Transaction loss is a possibility (async process).

Extra cycles on production database for transaction capture.

Some administrative actions not reflected on standby (for example, operations that are not logged).

All of the other components, tiers, and high availability features would remain the same as discussed in examples 1 and 2.

Disk mirroring No transaction loss.

All changes to the database (including administrative) are replicated.

Shortest restart time.

Performance impact of synchronous mirroring to a geographically remote site.

High price (software, hardware, and network).

Note: It is important to note that with a database replication and a Resource Manager replication strategy as discussed here in example 3, there is the risk of an out-of-sync condition occurring between the Library Server and the Resource Manager or Managers. This is due to the fact that the two replication processes are independent from one another, and no process is in place monitoring the two at the Content Manager transaction level. There are ways, however, to protect or minimize this impact:

If you run the replication process at night during a batch window, you can wait until both replication processes have completed before bringing the system back online for production use. The system would be unavailable during the replication process, but would ensure a synchronized system in the event of a failure requiring to move production to the standby servers. Also, with a nightly batch replication process, the environment could be exposed to a day’s loss of work as the daily transactions are not replicated until the nightly batch job executes.

For those environments that cannot afford to run a single nightly batch process, running the replication process on a scheduled or continuous basis is also possible. To ensure synchronization, we recommend that you use the Content Manager synchronous utility against the secondary environment in the event of a failure and before bringing the secondary servers online for production use.

Database replication is only one example discussed in this chapter for high availability. Refer to options 1 or 2 or 4, using a database fail-over strategy versus a database replication strategy. This would ensure synchronization between the Library Server and Resource Manager components as there is only one copy of the database or databases.

In document Content Manager Backup/Recovery and High Availability: (Page 127-132)