3. Architecture of DivRep Middleware
3.1. DivRep – Replication with Diverse Database Servers
3.1.3. Distributed Deadlock Avoidance
To avoid distributed deadlocks, DRA relies on a deadlock prevention scheme that uses a specific parameter, referenced as NOWAIT. When the underlying databases guarantee SI, the parameter ensures that an exception is raised as soon as concurrent
transactions attempt to modify the same data item. Figure 3-8 shows the functionality of the parameter using an example when two transactions, T1 and T2, execute against
a centralized database. Each transaction requires exclusive locks on two resources, A and B, but the order of lock acquisition is different for two transactions. Once T2
attempts to acquire a conflicting lock, lock A, an exception will be raised since T1
already holds the lock, and T2 will have to abort. Note that if NOWAIT parameter was
not enabled the opposite order of lock acquisition would have lead to a deadlock. The behaviour of the parameter is different from first-committer-wins and first-updater-
wins rules (Fekete, O'Neil et al. 2004) since no waiting for one of the transactions to
end is necessary. The use of NOWAIT might lead to an increase in the number of transaction aborts, and corresponding restarts (Bernstein and Goodman 1981) than if a deadlock detection scheme was used, but incurs no extra overhead needed for the construction of potentially complex waits-for graphs, which incur the principal overhead in deadlock detection schemes. The use of the NOWAIT parameter seems to be attractive due to its simplicity, especially for the workloads with low probability of deadlocks/aborts. Time T1 T2 Lock A - Granted Lock A - Denied
Lock B - Granted Commit
Lock B - Granted Abort
Resource A unavailable Time T1 T2 Lock A - Granted Lock A - Denied
Lock B - Granted Commit
Lock B - Granted Abort
Resource A unavailable
Figure 3-8 Deadlock avoidance using NOWAIT parameter on a non-replicated database exemplified with two concurrent transactions, T1 and T2 (the corresponding begin operations are omitted). As soon
as transaction T2 requests a conflicting lock for resource A, the database server will raise an exception
and T2 will have to abort.
A number of database servers offer the behaviour of NOWAIT parameter, such as Oracle, PostgreSQL, InterBase, Firebird etc., although the respective implementations might differ. For example, on Firebird the NOWAIT parameter is specified for a database connection, while on PostgreSQL the behaviour is available through the SELECT … FOR UPDATE operation, which locks the selected rows against concurrent updates.
In order to avoid distributed deadlocks in a replicated database not all, but exactly n-1 replicas should be configured with the NOWAIT parameter (where n is the number of
replicas). Otherwise, if the number of replicas that enabled NOWAIT parameter is less than n-1, a distributed deadlock, that spans replicas which have not enabled the parameter, could be observed. Hence the liveness in the replicated database will be compromised. We assume deployment of two replicas with DivRep and one has the
NOWAIT parameter enabled. Despite enabling NOWAIT parameter on one of them,
other types of concurrency conflicts might be reported, since the replicas use 2-Phase locking for write operations. In particular it is possible that a replica, which has not enabled NOWAIT, reports a centralized deadlock (when concurrent transactions executing on the same replica try to acquire a set of locks in different order). In this case DivRep follows the decision made by the replica and aborts the victim transaction. Similarly it should be noted that on the replica which has enabled
NOWAIT an exception due to the first-updater-wins rule could be observed, i.e. not all
exceptions would be raised as a result of the NOWAIT parameter.
One is interested in how much impact NOWAIT parameter has on the abort rate. We reason about the matter informally using Figure 3-9. Let us assume an FT-node, a system with two replicas, in which only one of the replicas, RA, has NOWAIT
parameter enabled (clearly FT-node is a special case of a system with more than two replicas). Two overlapping transactions, Ti and Tj, both try to modify data item x. The
moment transaction Tj tries to acquire exclusive lock on x, NOWAIT exception will be
raised and the middleware will abort the transaction on both replicas. Following the abort of Tj on RB the first-committer-wins rule is enforced, exclusive lock will be
granted to T
B
i and the transaction will successfully commit on both replicas.
However had NOWAIT parameter been enabled on replica RB too, transaction Ti
would have been also aborted, following the NOWAIT exception raised as part of Tj
execution. Hence, in the cases when more than one replica has NOWAIT parameter enabled the deadlock avoidance scheme might result in unnecessarily many aborts. This is not possible in DivRep, however, where only one replica enables the parameter (in this way DivRep obeys that n-1 replicas have NOWAIT enabled). Moreover, despite having NOWAIT on both replicas, unnecessary aborts might be an infrequent event in practice: it is likely that the abort of Tj happens earlier, in global
calendar time, than the request for the writing of x by Ti on RB – in this way TB j would
release the locks, NOWAIT exception would not be triggered by RBB and Ti would
w
i(x)
w
j(x)
w
i)
R
AR
Bw
j(x)
NOWAIT exception raised NOWAIT exception raised
a
ja
jc
ic
ib
ib
jb
jb
i Blockedw
i(x)
w
Time j(x)
w
i(x)
R
a
j AR
Bw (x)
ja
jc
ic
ib
ib
b
jb
i Blocked TimeFigure 3-9 Deadlock avoidance using NOWAIT parameter in a replicated database system with two replicas, RA and RB. Two transactions TB i and Tj execute concurrently and access the same data item x.
Only transaction boundary operations (begins (b), aborts (a) and commits (c)) and the write of the common data item are shown. The interaction between the replicas and the middleware and similarly
between the clients and the middleware is omitted for clarity.
There is an exception to the claim that DivRep does not produce unnecessary aborts: the aborts are possible in specific situations where one of the database engines in an FT-node provides deadlock detection mechanism, which might make contrary decisions to a replica with NOWAIT enabled. Consequently, it is possible that two replicas resolve conflicting transactions in a different order. An instance of this situation is depicted in Figure 3-10, which shows executions of two transactions Ti
and Tj on two replicas RA and RB. Replica RB A will report concurrency conflict
exception as soon as Tj tries to acquire lock for data item x and as a result the
transaction will have to be aborted. Correspondingly, on replica RB (which has
NOWAIT disabled) a deadlock detection scheme will decide that Ti should be aborted
due to the centralised deadlock. Without imposing execution determinism in DivRep both transactions will be aborted. However, the particular series of events are unlikely - the value of the deadlock timeout should, e.g. according to the best practice guides for database administrators, exceed the typical transaction time, and in that way avoid possibility that inconsistent decisions are made by different replicas (transaction Tj
would be aborted before deadlock detection mechanism is triggered on RBB and as a
consequence transaction Ti would commit).
FT-node uses two database servers, one of which has NOWAIT parameter enabled. When selecting the pair of servers a particular database engine might not offer
NOWAIT parameter, and ruled out as an inappropriate choice for FT-node. Nonetheless the selection process should verify if the functionality of NOWAIT could be simulated using other configuration parameters – many database servers offer a
lock timeout parameter that prevents indefinitely long blocking, e.g.
LOCK_TIMEOUT in Microsoft SQL server or innodb_lock_wait_timeout in MySQL. Commonly, setting the values of the parameters to zero simulates NOWAIT functionality; or at least fine-grained timeout values (e.g. in milliseconds) could achieve the same.
NOWAIT exception raised NOWAIT exception raised
wj(y) wj(x) wi(x) RA RB w i(y) aj aj ai ai bi bj bj bi
Local Deadlock: Ti chosen as “victim”
Time wj(x) wi(x) wi(y) NOWAIT Disabled wj(y) wj(x) wi(x) RA RB w i(y) aj aj ai ai bi bj bj bi
Local Deadlock: Ti chosen as “victim”
Time
wj(x) wi(x)
wi(y)
NOWAIT Disabled
Figure 3-10 An example of unnecessary aborts in DivRep. Two transactions Ti and Tj execute
concurrently and both access the two data items, x and y, on the two replicas, RA and RB. Only
transaction boundary operations (begins (b), aborts (a) and commits (c)) and the writes of the common data items are shown. The interaction between the replicas and the middleware and similarly between
the clients and the middleware is omitted for clarity.
B