Chapter 4. High availability strategies and options
4.1 Overview
4.1.6 Levels of availability
In a Content Manager environment, high availability and how to achieve high availability can mean many different things to different people. Because it is important to balance the downtime with costs, as discussed, it is also very important to evaluate what you will lose if your Content Manager service is temporarily unavailable.
Because Content Manager and content management systems in general include a variety of underlying services, subsystems, and hardware components, there are several factors and levels of high availability that can be deployed. The objective once again should be to provide an affordable level of availability that supports the business requirements and goals.
In a Content Manager system, there are several components that need to be taken into consideration when designing an end-to-end high availability system. For example:
IP sprayer/load balancer
Firewall process
HTTP server
Web application server and mid-tier applications (such as eClient)
LDAP server
Library Server application and database
Resource Manager application and database
Tivoli Storage Manager process and database
Disk subsystem
Tape and optical libraries
Operating system processes
Availability percentage Time loss per year
99.9999% (six nines) 32 seconds
99.999% (five nines) 5 minutes
99.99% 53 minutes
99.9% 8.8 hours
99.0% 87 hours (3.6 days)
Content Manager high availability solutions do not use one specific technology. The solutions can incorporate a variety of strategies, most of the time within the same location, and require significant up-front planning with continual
monitoring. When the solution requires servers or components to cross
geographic locations, this leans more toward the disaster recovery as discussed in Chapter 6.
The technologies used must be weighed against the costs for a particular implementation to meet the business requirements. There are several levels or technologies that can be deployed today to achieve high availability for systems in general. Figure 4-3 depicts some of the more commonly used technologies with Content Manager implementations to achieve high availability.
Figure 4-3 High availability tiers
We discuss levels 3 and 4 in more detail in the remaining part of this chapter in 4.2, “Content Manager HA strategies and options” on page 100, with disaster recovery (level 5) discussed in Chapter 6.
The following levels of availability are shown in Figure 4-3:
Level 1: Basic systems. Basic systems do not employ any special measures to
protect against data or services, although backups would most likely be takenCo
st
s
Levels of Availability
Basic No Redundancy RAID x Disk Redundancy Level 1 Level 2 Level 3 Failover Component Redundancy Replication Data Redundancy Disaster Recovery Site Redundancy Level 4--------
-------
High Avail ability Leve ls ---- -------- --- Level 5
Level 2: RAID x. Disk redundancy or disk mirroring, or both, are used to
protect the data against the loss of a disk. Disk redundancy can also be extended to include newer technologies such as SANs that are emerging in the marketplace as best practices for high availability disk subsystems.
Level 3: Failover. With Content Manager implementations, there are many
components to consider that could be a single point of failure (SPOF). An outage in any single component can result in service interruption to an end user. Multiple instances or redundancy for any single component should be deployed for availability purposes. There are two primary techniques to deploy fail-over strategies:– Application clustering or non-IP cluster failover (for example, WebSphere) – Platform clustering or IP-based cluster failover (for example, HACMP) Both of these are fundamental approaches to accomplishing high availability. Components that support application clustering might also be able to take advantage of load balancing in addition to availability benefits. For example, IBM WebSphere Application Server Network Dispatcher (ND) Version 5.0 and WebSphere Application Server Version 4.0 Advanced Edition (AE) has a built-in application clustering support. In order to achieve 99.9x% high availability in a Content Manager environment, you typically need to deploy both application clustering and platform clustering, assuming your solution is operating in a multi-tier or multiserver environment. If you run the full Content Manager stack on a single server (all components), platform clustering or IP-based failover would be the appropriate strategy to deploy.
In a platform clustering environment, a standby or backup system is used to take over for the primary system if the primary system fails. In principle, almost any component can become highly available by employing platform clustering techniques. With IP-based cluster failover, we can configure the systems as active/active mutual takeover or active/standby (hot spare). Additional information regarding fail-over strategies and techniques for Content Manager implementations are discussed in 4.2, “Content Manager HA strategies and options” on page 100.
Level 4: Replication (data redundancy). This type of high availability for
Content Manager implementations extends the protection by duplicating the database content (metadata and control tables) and file system content to another machine (server) in the event of a hardware, software, disk, or data failure. This would provide another level of protection and high availability in the event of a failure with data and content being replicated compared to a shared disk/fail-over strategy previously discussed. This type of a high availability implementation (replication) can also be used as a disasterrecovery strategy, the difference being whether the servers are located within the same location or are geographically separated.
For Content Manager implementations, this strategy would use a database replication and a Resource Manager replication strategy, as discussed later in this chapter in 4.2.5, “High availability example 3: Replication” on page 113.
Level 5: Disaster recovery
. This applies to maintaining systems in different sites. When the primary site becomes unavailable due to disasters, the backup site can become operational within a reasonable time. This can be done manually through regular data backups and automatically bygeographical clustering, replication, or mirroring software.
It is also possible and a best practice to combine multiple high availability levels within a single solution. For example, a fail-over (level 3) strategy for the Library Server database and a replication strategy (level 4) for the Resource Manager content, with all servers using a disk redundancy strategy (level 2).
Examples for levels 3 and 4 are discussed in 4.2, “Content Manager HA strategies and options” on page 100.