The Problem - Minimising Replication Overhead Using Priority Mechanisms

4. Experimental Evaluation of DivRep Performance

4.5. Minimising Replication Overhead Using Priority Mechanisms

4.5.1. The Problem

Experimental evaluation presented in Section 4.4.4 revealed the following: imposing a serial order of transaction boundaries (BEGIN and COMMIT operations) on different replicas using DRA algorithm (Section 3.1.1) incurs performance overhead.Multiple transactions, initiated by different clients, might be attempting to simultaneously execute a transaction boundary. The middleware handles transaction boundary requests using a mutex, tb_mutex. No provisions for a particular boundary execution order are in place – it is the underlying implementation of the mutex that defines the execution order, e.g. in our implementation of DivRep the order is dictated by the underlying JVM (Java Virtual Machine). Only one transaction boundary is permitted to execute at a time, thus a client might be blocked by others, without the possibility to progress until it is granted the mutex. This serialisation of transaction boundaries introduces lock convoy effect, (Rinard and Diniz 2003), (Lampson and Redell 1980) and has negative impact on system performance. One strives to decrease or eliminate the effect of the performance problem.

Naturally, the number of simultaneously blocked clients depends on the concurrency degree. It, also, depends on the ratio between the duration of the transaction boundary operations, Tb, and the duration of the transaction’s DML operations, TDML. The

larger the ratio between the two, Tb/TDML, the greater the chance many clients will be

blocked. If the ratio is small, i.e. the execution of the boundary operations is significantly shorter than the execution of DML operations, it is likely that many clients will be busy executing the DMLs and as a result the contention for the boundary operations will be smaller. If the durations of transaction boundaries are long relative to the DMLs, however, the chance that multiple clients wait for the mutex is greater. When the COMMIT operation is executed, the Tb duration depends

on the transactional profile. If the transactional profile is write-intensive the COMMIT operations will be longer because the changes will have to be flushed to the disk i.e. there is an I/O overhead of writing out all pages affected by the transaction, such as data and index pages and similarly REDO/UNDO log has to be written. Correspondingly, long execution times of transaction BEGINs could be observed in

DivRep. This observation can be explained as follows. We use a dummy SELECT operation to start a transaction. The reason is that JDBC interface does not support explicit BEGIN operation but assumes a transaction starts upon the first operation after a COMMIT or an ABORT. In order to serialize transaction boundaries we introduced the dummy SELECT operation that reads a table from the database. Although the duration of the query is short on average, occasionally the data has to be fetched from the disk, at which times the execution duration significantly increases. This is the reason why in the cases when database does not reside fully in the main memory, and cache hit ratio is poor, an expensive I/O operation has to be initiated. The replication algorithm of DivRep middleware introduces an overhead due to the serialisation of transaction boundaries. We have taken detailed measurements to enable us to accurately evaluate the impact of this serialisation. In particular we recorded the following measures of interest:

- Transaction response times, Tt. It is measured as the time between a client sends the

BEGIN command to start a transaction until it receives the notification that the COMMIT has been successfully executed (after the middleware have executed 2PC-DR (Section 3.1.1)) so that it can start the following transaction. Clearly, this time includes the execution of the transaction boundaries and the DMLs on both servers.

- Client-view transaction boundary response time, TbC . In the rest of the document

TbC will be used to refer to response time of either BEGIN or COMMIT commands

if not explicitly specified otherwise.

- Server-view transaction boundary response time, TbS.

We make a distinction between client-view and server-view boundary response times (Figure 4-19): TbS captures the execution of the actual SQL operation (BEGIN or

COMMIT) on a DBMS, while TbC includes the waiting time of each transaction to

acquire tb_mutex too, hence TbC ≈ TbS + TWAIT, where TWAIT represents the waiting

time for tb_mutex acquisition. TWAIT includes time spent by the DBMSs on execution

of DML operations from concurrent transactions, since both types of SQLs (DML and transaction boundary operations) compete equally for the resources, i.e. it is possible that execution of the DMLs blocks the concurrent transaction boundary operations. As the number of clients increases, the difference between the two types of boundary response times becomes greater. TWAIT increases because, under high load, the

DivRep Time Client Rx TbS (Ry) t1 t2 t3 t4 t5 t6 t7 t8 Ry TbS (Rx) TWAIT Tbc

Figure 4-19 Transaction boundary duration as perceived by different parts of a replicated system:

Client-view transaction boundary response time, TbC , calculated as t8 – t1, and server-view transaction

boundary response times, TbS (Rx), calculated as t7 – t4, and TbS (Ry), calculated as t6 – t5. The

difference between the TbC and a TbS might be significant due to the TWAIT, calculated as t3 – t2, time

needed to acquire the shared mutex. Please note that the execution of a transaction boundary on two replicas might not overlap in real time, one of the servers might finish the execution before the other

one starts it.

We experimented with a replicated server configuration when two FB servers are deployed. The reason why we used FB servers is that the implementation of a specific solution (Section 4.5.2) we offer for minimising the serialisation overhead requires a substantial change of PG’s functionality – the server processes should be runnable by privileged (root) user (this feature is unavailable in PG by default). However the results obtained with a pair of FB servers would apply to any replicated setup. The solution does not depend on any specifics of FB server.

The choice of hardware was the same as in the experiments described in Section 4.4. The client application was executing the write-intensive profile specified by TPC-C standard. The database size was three times bigger than the available RAM. We varied the number of clients to evaluate the impact of load on the serialization overhead. We executed experiments with 20 and 50 clients.

Figure 4-20 shows the response times of client-view BEGIN operations (TbC BEGIN)

plotted against corresponding transaction response times, Tt. A significant portion of

the transaction response times is comprised of the corresponding client-view BEGIN operations. This is not surprising since transaction latency includes the potentially

long TWAIT times. The average response time of a BEGIN operation is almost exactly

one fifth of the average transaction response time. The figure shows, however, the variability of both measures. Similar results were obtained for the COMMIT operation too. During the experiments we measured the CPU utilisation on the database server machines. We established that 25%-30% of the CPU resource was not used – it was reported by the Linux resource consumption utilities as idle. Therefore we could not achieve the maximum performance with the hardware used in the experiments. Although the underutilisation of CPU can be explained with noticeable I/O activity, it was clear that the serialisation of boundary operations has contributed to the performance bottleneck, too. This was confirmed with greater performance penalty once we increased the concurrency degree.

Transaction Boundary Overhead

10 100 1000 10000 100000 10 100 1000 10000 100000

Transaction Response Times (ms)

C lie n t- v ie w B E G IN s ( m s )

Figure 4-20 Transaction response times and the corresponding client-view BEGIN response times for the experiment with 50 Clients.

In document Performance Implications of Using Diverse Redundancy for Database Replication (Page 113-116)