Analytical Model - Simple Probabilistic Analysis

7.2 Simple Probabilistic Analysis

7.2.2 Analytical Model

In partial replication settings where each replica holds only a subset of the database, support for execution of distributed transactions is inevitable, un- less “perfect data partitioning” is assumed.1 _{In lock-based systems distributed}

transactions may get involved in distributed deadlocks, while in version-based systems remote read operations may be unable to obtain the requested database snapshot at remote replicas. Both distributed deadlocks and failed remote read operations result in aborted transactions. Hence, the goal of our probabilistic analysis is twofold: (a) to quantify the abort rate of transactions due to distributed execution; and (b) to estimate the abort rate of transactions at the termination phase.

The replicated system is modeled as a number of database sites, si t es, and a fixed-size database composed of DB_S I Z E items. Every database item has a number of copies uniformly distributed over the replicas. Thus, the entire system consists of DB_S I Z E· copies resources. All transactions submitted to the database have the same number of operations op and all operations take the same op_t ime to execute. An operation is defined as a read or a write on one data item; as a consequence, a single SQL statement may consist of many operations. Each data item has the same probability of being accessed (there are no hotspots). We model neither communication between database sites — both local and remote accesses to data items have the same cost — nor failures of the replicas.

1_{If the database is partitioned so that every transaction can execute at a single site, support}

for distributed transactions is not needed. However, such an approach requires prior knowledge of the workload and a very particular data distribution over the replicas or at least a single site that holds the whole database.

87 7.2 Simple Probabilistic Analysis

Every database site receives T PS transactions per second, so the total load over the replicated system is Tot al T PS = T PS · sites and there are always

t x n= TotalT PS · op · op_time concurrent transactions executing. Every trans-

action is read-only with the probability of L. Each operation within the update transaction has the probability k to be a read operation. The number of concur- rent read-only transactions in the system is r_t x n= L · t xn, each with op read operations. The number of update transactions is given by w_t x n= (1− L)· t xn with r_op= k·op read and w_op = (1−k)·op write operations. We also require that the average number of data items accessed by concurrent transactions do not exceed the database size. The main parameters of the model are listed in Table 7.1.

DB_SIZE database size

TPS number of transactions per second

submitted to a database site

L fraction of read-only transactions

op number of operations in a transaction

k fraction of read operations in update

transactions

op_time time to execute an operation in seconds

sites number of replicas in the system

copies number of copies of each data item Table 7.1. Model parameters

In the following two sections we introduce our probabilistic analysis for eval- uating the abort rate of partial replication when lock- and version-based con- currency control mechanisms are used. We assume that the lock-based model ensures 1SR, while the version-based model provides GSI.

Lock-based system

Our model has been strongly influenced by the analytical model introduced by Gray et al. [1996], where the authors analyze the deadlock rate of fully replicated database systems based on locking only. Besides the assumptions con- sidered throughout our probabilistic modelling, the work in[Gray et al., 1996]

88 7.2 Simple Probabilistic Analysis

does not account for read operations — all transactions are composed of updates only. Hereafter we model read operations within update transactions as well as read-only transactions. To calculate the abort rate at the termination phase, we have followed the ideas introduced in [Pedone, 1999].

Execution phase. As in[Gray et al., 1996], we suppose that, in average, each transaction is about half way complete, thus the number of resources locked by executing transactions is at most

r es_l ocked = ro_read_locks + u_locks, (7.2.1) where

r o_r ead_l ocks= r_t x n· op

2 , (7.2.2)

u_l ocks= u_write_locks + u_read_locks = w_t x n· w_op 2 + w_t x n· r_op 2 = w_t x n· op 2 . (7.2.3)

From Eq. 7.2.3, the probability that a read operation waits because of update transactions is

p_r_op_wai ts_u= u_w r i t e_l ocks

DB_S I Z E· copies =

w_t x n· w_op

2· DB_SI Z E · copies. (7.2.4) Similarly, p_w_op_wai ts_u and p_w_op_wai ts_r are the probabilities that a write operation waits for resources locked by update and read-only transactions:

p_w_op_wai ts_u= u_l ocks

DB_S I Z E· copies=

w_t x n· op

2· DB_SI Z E · copies, (7.2.5)

p_w_op_wai ts_r= r o_r ead_l ocks

DB_S I Z E· copies =

r_t x n· op

2· DB_SI Z E · copies. (7.2.6) Now we can calculate the probability that a read-only transaction waits for resources held by update transactions,

p_r_t r an_wai ts_u= 1 − (1 − p_r_op_waits_u)op_, _(7.2.7)

and the probability that an update transaction waits because of other update transactions,

p_u_t r an_wai ts_u= 1 − (1 − p_r_op_waits_u)r_op

89 7.2 Simple Probabilistic Analysis

and because of read-only transactions,

p_u_t r an_wai ts_r = 1 − (1 − p_w_op_waits_r)w_op. (7.2.9) A deadlock is created if transactions form a cycle waiting for each other. We do not consider deadlocks that involve more than two transactions: deadlocks composed of cycles of three or more transactions are very unlikely to occur[Gray et al., 1996]. So the probability for a read-only transaction to deadlock is

p_r_d ead l ock≈ p_r_t r an_wai ts_u· p_u_t ran_waits_r

r_t x n , (7.2.10)

and the probability that an update transaction deadlocks is

p_w_d ead l ock≈ p_u_t r an_wai ts_u

w_t x n

+p_u_t r an_wai ts_r· p_r_t ran_waits_u

w_t x n .

(7.2.11)

From Eq. 7.2.10 and 7.2.11, read-only and update transactions deadlock rates are:

r_d ead l ock_r at e= p_r_d ead l ock

op· op_time , (7.2.12)

w_d ead l ock_r at e= p_w_d ead l ock

op· op_time . (7.2.13) Finally, we can estimate the total number of deadlocks of the system (in transactions per second) as

a bor ts_d ead l ock= r_deadlock_rate · r_t xn

+ w_deadlock_rate · w_t xn. (7.2.14) Termination phase. If there is only one copy of each data item (i.e., there is no replication), strict 2PL ensures serializability and thus transactions are not aborted during the termination phase. For more than one copy, two conflicting transactions executing concurrently at distinct database sites may violate 1SR. As mentioned in Section 7.2.1, to ensure 1SR, each committing transaction has to pass the certification test which checks that there is no transaction that executed concurrently and updated data items read by the committing transaction. Notice that conflicts appear only if transactions access different copies of the same item.

90 7.2 Simple Probabilistic Analysis

We consider only those transactions that were not aborted during execution. Thus, Tot al T PS, the number of read-only and update transactions are:

Tot al T PS0= TotalT PS − abor ts_deadlock, (7.2.15)

r_t x n0= r_t xn · (1 − p_r_abor t), (7.2.16)

w_t x n0= w_t xn · (1 − p_w_abor t), (7.2.17)

t x n0= TotalT PS0· op · op_time. (7.2.18) If there are only two concurrent transactions in the system, the probability that an update transaction passes the certification test is 1− w_op/DB_SI Z Er_op

. Then the probability that the i-th transaction passes the certification test after the commit of(i − 1) transactions is

p_i_t x n_pass= 1−(i − 1) · w_op DB_S I Z E r_op . (7.2.19)

On average, the probability that a transaction does not pass the certification test is p_t x n_no_pass= 1 − 1 N · N X i=1 p_i_t x n_pass, (7.2.20) where N is the number of concurrent update transactions, excluding those that execute at the same replica and do not cause certification aborts:

N = w_t xn0·si t es− 1

si t es . (7.2.21)

Consequently, the abort rate of update transactions that do not pass the certification test is defined as follows:

u_a bor t_r at e= p_t x n_no_pass

op· op_time . (7.2.22)

And at last, the total number of aborts due to the certification test is

a bor ts_sr_cer t= u_abor t_rate · w_t xn0. (7.2.23) Version-based system

During the execution, transactions are aborted if the requested versions of the data items are not available. We assume that all database sites are able to main- tain up to V versions per data item, e.g. with V = 1, transactions can only obtain the current version of the data item. Notice that we assume a strict first-

committer-wins rule, i.e., transactions are never aborted during the execution phase due to write-write conflict; such conflicts are resolved at termination.

91 7.2 Simple Probabilistic Analysis

Execution phase. In the same way as Eq. 7.2.1, during its execution, a trans- action updates at most w_op resources. Therefore, at any time there are

r es_upat ed_e x ec = (w_t xn · w_op)/2 resources updated because of the trans- actions in the execution phase. Some of these transactions will be successfully certified and their updates will be propagated to all the copies of the data items accessed. These remote updates will influence the total number of resources updated.2 Therefore, during termination there are

r es_upd at ed_t er m=(copies − 1) · w_t xn

0_{· w_op}

2 × p_commit (7.2.24)

resources updated, where w_t x n0 is defined in Eq. 7.2.34. p_commi t is the probability for an update transaction to pass the certification test and is equal to 1− p_w_abor t_term (see Eq. 7.2.38). Hence, the total number of resources updated is r es_upd at ed = res_updated_exec + res_updated_term and, con- sequently, the probability for an item to be updated V times by concurrent trans- actions is: p_i t em_v_upd at ed = _{r es}_{_upd at ed} DB_S I Z E· copies V . (7.2.25)

The probability for a read operation to abort is the same as the probability of waiting for V locks, i.e., the probability of V concurrent transactions to update the same item:

p_r_op_a bor t= p_item_v_updated. (7.2.26) Since each read-only transaction has op operations, the probability for a read- only transaction to abort is

p_r_a bor t= 1 − (1 − p_r_op_abor t)op, (7.2.27) and the probability of abort of an update transaction is

p_w_a bor t = 1 − (1 − p_r_op_abor t)r_op. (7.2.28) From Eq. 7.2.27 and 7.2.28, the abort rates for read-only and update transactions are as follows:

r_a bor t_r at e= p_r_a bor t

op· op_time, (7.2.29)

2_{We do not account for remote updates in the lock-based model since in general the deadlock}

92 7.2 Simple Probabilistic Analysis

and

w_a bor t_r at e= p_w_a bor t

op_{· op_time}. (7.2.30)

Therefore, the total number of aborts during the execution phase of transactions is

a bor ts_e x ec= r_abor t_rate · r_t xn + w_abor t_rate · w_t xn. (7.2.31) Termination phase. Similarly to Eqs.7.2.15–7.2.18, we have to recalculate the number of concurrent transactions that reach the termination phase:

Tot al T PS0= TotalT PS − abor ts_exec, (7.2.32)

r_t x n0= r_t xn · (1 − p_r_abor t), (7.2.33)

w_t x n0= w_t xn · (1 − p_w_abor t), (7.2.34)

t x n0= TotalT PS0· op · op_time. (7.2.35) Furthermore, transactions aborted during the execution phase also affect the fraction of read-only and update transactions present at the termination phase:

L0= r_t x n

t x n0 . (7.2.36)

Thus, the probability that a write operation conflicts with another write operation is

p_w_op_con= w_t x n

0_{· w_op}

2· DB_SI Z E . (7.2.37)

The probability that an update transaction aborts is

p_w_a bor t_t er m= 1 − (1 − p_w_op_con)w_op. (7.2.38) Update transactions abort rate due to write-write conflicts is determined as

w_a bor t_r at e_t er m= p_w_a bor t_t er m

op· op_time . (7.2.39) Finally, the total number of aborts at the termination phase is

In document On non-intrusive workload-aware database replication (Page 108-115)