• No results found

Global System Resources

shadow entry list

9.3.1 Global System Resources

Global system resources refer to system resources outside of a Coda client such as network bandwidth, server CPU time and server disk space. Because the current IOT implementation requires no change to any server internal data structures, transaction execution does not cost any additional server disk space. Hence, our evaluation focuses on two main global resources:

server load and network traffic. We use the term server load to refer to the total amount of

server CPU and server I/O time spent on behalf of a particular system task associated with transaction operations. We studied the following two specific questions:

1. How is server load affected by transaction-related system activities? 2. How is network traffic affected by transaction-related system activities?

There are three kinds of transaction related activities that consume global system resources: transaction reintegration, transaction validation, and connected transaction execution. We first present the measurements of global system resource cost incurred by transaction reintegration, and then discuss the impact of transaction validation and connected transaction execution on global system resource usage.

9.3.1.1 Server Load for Reintegrating Disconnected Transactions

Methodology When there are no transaction executions, mutations performed in a discon- nected operation session are reintegrated to the servers in one batch requiring a single reinte- gration operation on the corresponding servers. When there are disconnected transactions, the mutations will be reintegrated in different batches requiring multiple reintegration operations on the servers. Thus, the impact of reintegrating disconnected transactions on server load boils down to reintegrating the same set of mutations in one batch versus in multiple batches.

The server load for reintegrating a set of mutations depends on many factors such as the number, the type and the mixture of the involved mutation operations. Overall, the reintegration server load can be considered as consisting of two main factors: a fixed initial setup cost and the cost that is proportional to the number of mutation operations involved. When the number of mutations is small, the first factor dominates the reintegration server load. In contrast, when the number of mutations is large, the second factor dominates. Moreover, it grows at a faster than linear speed because it involves activities such as sorting.

Reintegration Server Load Experiment One Run Two Runs

Workload (millisecond) (millisecond) Andrew Benchmark 11003 (617.3) 45429 (1276) CFS-Build 912 (8.2) 1724.6 (20.6)

This table shows the total elapsed time for a dedicated server to perform reintegration for the disconnected mutations of one and two independent runs of the Andrew Benchmark and CFS-build task. The time values are in milliseconds and represent the mean over five runs. The numbers in parentheses are standard deviations.

Table 9.12: Impact of Disconnected Transactions on Reintegration Server Load

As a result, the impact of disconnected transactions on the total reintegration server load can go either way. Generally speaking, when the total number of mutations is small, disconnected transactions will increase the reintegration server load. On the other hand, when the total number of mutations is large, disconnected transactions can reduce the reintegration server load. We use two experiments to demonstrate this effect.

The first experiment compares the server load of reintegrating one and two independent runs of the Andrew Benchmark, which contains a large number of mutations. The second experiment compares the server load of reintegrating one and two independent runs of the CFS-build task,

which compiles the Coda cfs tool and contains only a few mutations. Each experiment run consists of the execution of the workload (one or two independent runs of the Andrew Benchmark and CFS-build task) on a disconnected laptop client and the ensuing reintegration from the laptop client to a dedicated server. In order to eliminate possible interference from other clients, we use a separate network between the client and the server during reintegration and make sure that there are no other concurrent threads or RVM activities on both the client and the server during reintegration.

Results The results of the two experiments are shown in Table 9.12. Because the Andrew Benchmark contains a lot of mutations, the server elapsed time for reintegrating two discon- nected benchmark runs together is much bigger than the sum of reintegrating the two runs one at a time. In contrast, the CFS-build task contains only a few mutations. Hence, reintegrating the two runs separately costs more server time than reintegrating them together. Suppose that there is a disconnected operation session containing two independent runs of the Andrew Benchmark, the reintegration server load will decrease when either of the two runs is executed as a transaction. Conversely, if the disconnected operation session contains two independent runs of the CFS-build task, using a transaction for either of the two runs will increase the reintegration server load.

9.3.1.2 Network Traffic for Reintegrating Disconnected Transactions

Methodology If a disconnected operation session does not contain any transaction execu- tion, all disconnected mutations are sent to the servers using one reintegration RPC. When disconnected transactions are involved, the same set of mutations will be broken up into several smaller reintegration RPC calls. This results in network traffic overhead because transmit- ting the same amount of data using multiple RPC calls consumes more packets than a single RPC call. Unlike server load, disconnected transactions always increase reintegration network traffic.

We use multiple independent runs of the Andrew Benchmark to measure the network traffic overhead by comparing reintegrating the multiple runs using a single RPC to that using one RPC per run. The experiment was conducted in the same environment as described in section 9.3.1.1.

Results The measured results displayed in Figure 9.6 indicate that there is only a slight over- head in reintegration network traffic for disconnected transactions containing a large number of mutations, such as the Andrew Benchmark. The overhead could be higher when the involved disconnected transactions contain only a few mutation operations.

Run Reintegration Number Traffic(KB) 1 1313 (7.6) 2 2366 (75.2) 3 3923 (30.3) 4 5222 (44.5) 5 6476 (34.7) 6 7689 (19.3) 7 8925 (40.3) 8 10262 (38.8) 9 11537 (24.8) 10 12858 (37.7)

Number of Andrew Benchmark Run

2 3 4 5 6 7 8 9 10

Total Network Traffic(KB)

3000 6000 9000 12000 15000 0 1 Combined Reintegration Separate Reintegration

The table in this figure shows the measured network traffic for reintegrating disconnected mutations of multiple independent runs of the Andrew Benchmark. The metric used is KB and the values represent the mean over five runs. The number in parentheses are standard deviations. The two curves on the right plot the same data presented in the table and a linear projection based on the reintegration traffic of a single benchmark run.

Figure 9.6: Reintegration Traffic for Multiple Runs of Andrew Benchmark

9.3.1.3 The Impact of Transaction Validation

Transaction validation as currently designed is just comparing version vectors for the involved objects. We do not measure its effect on both server load and network traffic because it does not have any long term effect on these two global system resources. The main reason is that the internal mechanisms for transaction validation are overloaded with those for cache coherence maintenance, as discussed in section 8.6.1. In essence, the server workload and network traffic spent on behalf of validating a transaction will relieve the same amount of work that otherwise would have been carried out by client cache validation and callback maintenance, and vice versa.

9.3.1.4 The Impact of Connected Transaction Execution

Connected transaction execution has an impact on both the server load and the network traffic. There are two main factors: the write-back caching effect due to mutation logging and the 2PC protocol for distributed transaction commitment. Obviously, 2PC will increase both the server load and the network traffic. However, the write-back caching effect of mutation logging can influence both the server load and network traffic in either direction due to the fact that the current Coda implementation uses a write-through caching policy.

Connected transactional execution of applications containing a large number of mutation operations can reduce both the server load and network traffic compared to connected non- transactional execution. There are two main reasons. First, because mutations get batched at the client, there are opportunities to cancel redundant mutation operations as discussed in Chapter 4. Second, it consumes less network traffic and server load to transmit and perform a large number of mutations at once than to process them one at a time. On the other hand, both the server load and network traffic can be increased by connected transaction execution if the application contains only a few mutation operations because the initial overhead of reintegration will dominate the cost.

We decided not to evaluate the effect of connected transaction execution on server load and network traffic for the following reasons. First, the 2PC protocol has not been fully implemented yet. Thus, how it increases the server load and network traffic will not be known until the actual mechanisms are put in place. Second, a fair comparison on the server load and network traffic between connected transactional and non-transactional executions cannot be made until Coda implements a write-back caching policy.