Possible Changes to DivRep - Architecture of DivRep Middleware

3. Architecture of DivRep Middleware

3.4. Discussion

3.4.2. Possible Changes to DivRep

Error detection of the pessimistic regime requires consistent snapshots of data from both replicas. One might be interested in what would be the consequences if the replication protocol of DivRep was modified so that instead of the latest committed snapshot a transaction observes an “older” one. In this way DivRep would ensure GSI (Elnikety, Zwaenepoel et al. 2005). The pessimistic regime would still use the comparator function for improved error detection but it would not (necessarily) operate on the snapshot installed by the last committed transaction. It is, however, far from obvious that this could bring any performance benefits. This argument is workload-dependant and if a high conflict rate was observed this change to DivRep would have negative impact on performance, since the probability of overlaps between transactions would be higher. For example, let the following schedule of transaction boundaries, belonging to two transactions T1 and T2, be produced: b1, b2,

c2, c1. Then the logical start of another transaction, T3, which conflicts with T1 will be

crucial for decreasing the abort rate: if b3 is placed immediately after c1 (the latest

snapshot is available) then the conflict will be avoided; but if it is placed in between c2 and c1 (the changes of c1 are unavailable) then T1 and T3 will overlap and the

conflict will lead to an abort.

It is possible to extend fault tolerance features of DivRep by devising an error detection mechanism for handling SQL DDL (Data Definition Language) operations, which are used to define database structure (e.g. CREATE TABLE operation), and stored procedures (precompiled pieces of code available to applications accessing database through DBMS APIs). Although DDL operations are usually less frequent than DML operations, ensuring the consistency of their results on different replicas is as important. Nevertheless, this is far from a trivial task since comparing the results of DDL operations from diverse replicas requires the access to different database metadata information. Concerning stored procedures, the task is more intricate. Let us assume that comparing the effects of a stored procedure execution on diverse replicas could be done using the returned results. However this is not adequate. Firstly, there are cases when a stored procedure does not return a result. Secondly, the execution of a stored procedure could involve both DMLs and DDLs and, thus, developing a

generic solution for checking the consistency of the results created by a stored procedure across diverse replicas is not obvious. Database triggers, pieces of code that automatically execute in response to an event, could help in providing error detection among results of DDL operations and stored procedures. In particular,

schema-level triggers, which exist in the Oracle DBMS and fire when a database schema object is modified, could be useful.

We propose the use of the FT-node for tolerating design faults in order to increase failure detection. It is evident, however, that the middleware itself represents a single point of failure. Standard techniques, such as primary-backup replication (Budhiraja, Marzullo et al. 1993) or implementing decentralised DivRep, could be used to alleviate this problem and improve availability and scalability. The middleware is likely to be relatively simple and, thus, we can achieve high confidence in its being implemented correctly, i.e. free of design faults. Therefore, presuming fail-stop (only crashes) failures becomes reasonable assumption. Hence the solutions based on this assumption become relevant.

In order to enhance DivRep so that the replicas are able do decide on the outcome of a transaction (to commit or abort) even in the presence of failures, it is possible to substitute 2PC-DR (Figure 3-5) protocol with an implementation of the Non-Blocking Atomic Commitment (NB-AC) protocol. For example the well-known Three-Phase Commit algorithm (Skeen 1981), which assumes synchronous systems and bounded communication delays, can be used to solve NB-AC problem. Alternatively DivRep could be equipped with Paxos Commit algorithm (Gray and Lamport 2006) in order to solve the atomic commitment problem between the replicas and the comparator function. This is likely to decrease response time at the expense of complicating the replication protocol.

One possibility to further improve the performance of the optimistic regime of DivRep is to introduce cancellation of read operations. Load balancing using skip feature is effective only in certain scenarios. Let us assume there are two replicas, Rx

and Ry, executing a read operation r(a), as part of transaction Ti. If the replica Rx

starts and completes r(a) while the other replica, Ry, is executing the preceding

operations in Ti, the skip feature will cause replica Ry to leave out the execution of

r(a). However the skipping is impossible if the executions of r(a) overlap, in global calendar time, on two replicas (it is in fact more restrictive: no skip occurs if DivRep receives the result to r(a) only after the thread serving the slower server has sent the

read to the replica). The best that the middleware can do in that situation is to cancel the execution of the read on the slower replica, once it obtains the result of r(a) from the faster one. Nevertheless, the cancellation would carry a performance overhead and it is unclear whether cancellation will improve the situation or make things worse. The effectiveness would depend on its overhead and the decreased load on the slower replica. From implementation point of view the cancellation requires the support from the database engine, a feature not available on all servers, and a separate thread of execution on the client side due to which undesirable race conditions might ensue. Moreover this could lead to a wrong SQL operation being cancelled. For example, in between issuing a cancel operation, from a dedicated client thread, and executing it in the database, the long-running read (the operation to which the cancel was directed) might finish and another operation would start executing. Therefore, the cancel will wrongly terminate the execution of the subsequent operation.

DivRep uses active replication with the aim to compare results and provide error detection. One might be interested in using an alternative for the active replication in order to improve performance. To that end deferred writes technique (Bernstein, Hadzilacos et al.) (Section 2.3.1) is one such possibility. For example, aborting a transaction during its execution on a local replica, before the updates are sent to the other one, would be less costly. Likewise, by localising the execution of multiple updates on a replica and propagating them together, the number of messages in the network would be reduced. However, the use of deferred writes is unacceptable for DivRep, at least when the pessimistic regime of operation is considered. The technique involves execution of the writes on a local replica and propagation of the respective results (e.g. the redo log records) to the other replicas. This would prevent the error detection deployed in the pessimistic regime because of the following:

- The input to the comparison function (see Figure 3-5) is indeterminate – the changes produced by the local replica are just applied to the remote one.

- Incorrect results, produced by a faulty replica, would be propagated.

This could be alleviated with the propagation of full SQL operations, instead of the log records, to the remote replica. Even in this case, the advantage of executing result comparison in parallel with SQL operations will be lost - the results would have to be compared in the critical path, in the end of the transaction. Moreover, the active replication will have to be continued for SELECT operations, so that the results of the reads could be compared.

4. Experimental Evaluation of DivRep

In document Performance Implications of Using Diverse Redundancy for Database Replication (Page 66-69)