Enterprise Replication overview - Shared Disk Secondary servers

Chapter 6. Shared Disk Secondary servers

7.2 Enterprise Replication overview

The basic building blocks of any replication network are the replication servers and the replicates. Replicates may be grouped into replicate sets. In this section, we describe these concepts in more detail.

7.2.1 Database servers

The basic element of any ER network is a

database server

. Any instance of IDS that is properly licensed for replication may be a database server. Do not confuse this term with the more general terms

node

and

server.

The system on which a database server executes is often called a

server

and when networks are being discussed, both a hardware server and a database server are sometimes called

nodes

. However, in the context of Enterprise Replication in IDS, the basic element is an IDS instance, and that is called a

database server

in the ER manuals.

Note that this term denotes the entire IDS instance, and not any specific database within the instance. An instance either is or is not an ER database server. You can choose to replicate data from only some databases within the server, but you must define the whole instance as a database server.

ER is replication between two or more database servers. Those may be on the same system (server) or on separate systems. They may be geographically near each other or far away from each other. And they may be different IDS versions. The only requirement is that there be a communications link over which the two database servers may communicate.

ER may involve many database servers organized in complex ways. Details are discussed in 7.3, “Topologies for replication” on page 161.

7.2.2 Replicates and replicate sets

Each database server doing replication knows what should and should not be replicated based on the set of

replicates

that are defined for that database server. A replicate can be considered as a set of rules that define what to replicate, when to replicate, where to replicate, and how to handle any conflicts during the replication process.

򐂰 What to replicate is defined by the result set of a SELECT statement, and anything you can express that way can be a replicate (subject only to the constraint of being data from a single table). It may include all or just some columns of each row, and some or all the rows of the table, as shown in Figure 7-2 on page 157. The target of a replicate does not have to be the

same as the source. The target may be different tables, columns and rows from the source. It is similar to doing an INSERT INTO...SELECT FROM... statement.

Figure 7-2 Selective replication

򐂰 When to replicate defines the frequency of replication, that could be either immediate, at some regular interval, or at a specific time of the day. More details are in section 7.2.4, “Frequency of replication” on page 158.

򐂰 Where to replicate specifies the name of the target database servers involved in the replication scheme.

򐂰 How to handle any conflicts is discussed in section 7.2.5, “Automatic conflict resolution” on page 159.

This means replication is very, very flexible. Business requirements should drive the definitions and an ER design should include as many replicates as

necessary.

In a complex replication scheme, there may be many replicates, and some of them should likely be treated as a group. That is, the set of replicates should be started, stopped, changed, and so forth, as a group. This is usually required in order to keep the data in the target database consistent when data from related tables is replicated. To make this kind of group operations easy, ER has the concept of

replicate sets

. A replicate set is just a set of replicates that are used together. They are started, stopped, and so forth, as a unit rather than as independent replicates.

With those basic ideas in mind, we can now discuss how an ER scheme can be devised. One way to approach the question is to think about the purposes of replication, then the type of replication appropriate to each purpose, and then the way in which the database servers need to be organized. The combination of those results will usually provide the topology of hardware and software that is

WHERE region = 1

7.2.3 Replication templates

In an enterprise where the database contains a large number of tables, and some or all of those tables must be replicated to one or more database servers, then defining replicate for every single table manually can be cumbersome and error prone. The better solution is to use a replication template. A replication template provides a mechanism to set up and deploy replication when a large number of tables and servers are involved.

Setting up a replication environment using a template is fairly straightforward. First, define a template and then instantiate it on the servers where you want to replicate the data. When a template is defined (with the command cdr define template), it collects the schema information of a database, a group of tables, columns, and primary keys. Then it creates a replicate set consisting of a group of master replicates. A master replicate is a replicate that guarantees data integrity by verifying that replicated tables on different servers have consistent column attributes. Once you instantiate (with the command cdr realize template) the template, all the defined replicates are started.

Other advantages of using templates are that they:

򐂰 Guarantee data integrity and schema verification across all the nodes. 򐂰 Facilitate automatic table generation at the target server, if tables do not exist. 򐂰 Allow alter operations on the replicated tables.

򐂰 Enable initial data synchronization when adding new servers in replication. Because templates are designed to facilitate setting up large scale replication environments, they set up replication for full rows of the tables. That means all the columns in the table are included, such as when using a SELECT * FROM... statement in a replicate definition. If you want to do selective replication, then first set up the replication using templates. Once the template is realized, update the individual replicates to customize it as per the requirement.

For more details on templates, see the article on IBM developerWorks at:

http://www.ibm.com/developerworks/db2/library/techarticle/dm-0505kedia/ index.html

7.2.4 Frequency of replication

Replicated data does not have to be sent between systems as soon as changes are made. That is one option, but it is not the only option. The other options are to transfer data at some regular interval or at a specific time of day.

values. The data to be sent accumulates during the period between replications. So the quantity of data, the space required to hold it, and the network bandwidth required to send it all need to be considered.

A specific time of day means the data is sent at the same time every day. An example might be to replicate at 8:00 pm each evening. As with the interval, the data to be sent will accumulate until the time for replication arrives.

If either interval or specific time replication is used, the target systems do not reflect the new data or changed data until the replication is completed. Be sure your business can tolerate the delay. In addition, the changed data is held on its source database server until the specified time for the data transfer. Be sure there is enough space to hold that data. High rates of change may require more frequent replication in order to limit the space for changed data.

7.2.5 Automatic conflict resolution

When data is replicated from one database server to another and the target database server is also making changes simultaneously to the local data, replication conflicts could occur at the target database server.

Similarly, if update-anywhere replication is used, consider whether or not conflicts can occur. That is, can two transactions change the same row at the same time? If the answer is yes, then think about how to determine which of the changes will persist in the database.

If there is a deterministic algorithm to choose which of two conflicting updates is kept (and which is discarded), then update-anywhere replication may be

acceptable. However, if there is not a good way to choose, then consider whether or not the conflicts can be avoided.

Even if there is an acceptable resolution scheme for conflicting updates, be sure it covers all the cases. For example, what will happen if one transaction deletes a row that another transaction is updating? Does the row get deleted, or is the updated row retained? There are no general correct answers to such questions; each enterprise must answer based on their specific requirements.

To ease the conflict resolution process, Enterprise Replication provides several conflict resolution rules, as described in Table 7-2. These rules enable ER to automatically detect and resolve the conflicts.

Table 7-2 Conflict resolution rules

Another way to avoid conflicts is by generating unique serial column primary keys in an Enterprise Replication environment. The CDR_SERIAL configuration parameter enables control over generating values for serial and SERIAL8 columns in tables defined for replication.

7.2.6 Data synchronization

Enterprise Replication provides direct synchronization options to replicate every row in the specified replicate or replicate set from the source server to all the specified target servers. You can use direct synchronization to populate a new target server, or an existing target server that has become severely inconsistent. You can synchronize a single replicate or a replicate set. When you synchronize a replicate set, Enterprise Replication synchronizes tables in an order that preserves referential integrity constraints (for example, child tables are synchronized after parent tables). You can also choose how to handle extra target rows and whether to enable trigger firing on target servers.

Conflict resolution rule Effect on data replication at target database server

Ignore ER does not attempt to resolve the conflict.

Time stamp The row or transaction with the most recent time stamp is applied.

SPL routine Enterprise Replication uses a routine written in Stored Procedure Language (SPL) that the user provides to determine which data should be applied.

Time stamp with SPL routine If the time stamps are identical, Enterprise Replication invokes an SPL routine that the user provides to resolve the conflict.

Always-apply ER does not attempt to resolve conflict, but always applies the data to the target.

In document Informix Dynamic Server 11: Extending Availability and Replication (Page 176-181)