Overall performance comparison for sequential and parallel execution

CHAPTER 6 MEASUREMENTS AND ANALYSIS

6.1 Performance measurements

6.1.4 Overall performance comparison for sequential and parallel execution

execution

We measure the overall performance of a Web Service deployment in Apache Axis and DHArch. Handler distribution causes an overhead. However, there are also gains because of its offerings. In this section, our interest is to find out the performance benefits coming from the advantages of the distribution.

Apparently, the management of the distributed handler execution and the transportation of the tasks affect the execution time. The cost is inevitable but its burden can be reduced by reshuffling the configuration of the execution. In short, there will be always a cost, originated from the distribution, to utilize DHArch.

On the other hand, there are ways of compensating the overhead and even achieving a promising overall performance. The first way of improving the performance of a deployment is to establish concurrent handler execution in a distributed environment.

127

Apache Axis conventional handler deployment does not let the handlers run in a parallel manner. However, there are many independent handlers from each other so that they can process the SOAP messages concurrently. For instance, a monitoring handler does not depend on a logging handler. They can be easily executable concurrently. The second way of improving performance is to utilize faster machines. From the results of previous section, it is deducible that the time spending for a handler differs from computer to computer. A faster machine may contribute to the overall performance when an appropriate handler is deployed into it. For instance, encryption and decryption handlers‟ distribution to the faster machines within a secure environment contributes best to the overall system.

The measurements shown in Figure 6-7 depicts the results from the multi-core system, explained in section 6.1.2. The values show the round trip time of a service request. Clients record the time of the request initiations and calculate the elapsed time when they receive the responses. Hence, the measurements contain transportation, service and execution times of the handlers. Every observation was repeated 100 times.

It is clearly seen that the best results are observed when the handlers are able to run concurrently. However, processing them concurrently may not be always possible. As we discussed earlier, the rules between the handlers have to be obeyed; the dependencies have to be considered. For example, a security handler needs to be processed first. Otherwise, the remaining handlers cannot understand the message because of the encryption.

The difference between configuration 1 and 2 is the overhead originating from the distribution of 5 handlers. The first configuration is to utilize Apache Axis in-memory

128

handler deployment. The second configuration is to deploy the handlers to the individual cores in a single machine by using DHArch. Apache Axis deployment is the reference point for the comparisons. In order to make an accurate comparison, configuration 1 and configuration 2 utilizing Apache Axis and DHArch respectively are same: sequential execution. Because of the distribution of the handlers to the individual cores, DHArch increases the execution time slightly.

Figure 6-7 : The service execution times of the six handler configurations containing the five handlers in the multi-core system

Parallel handler execution reduces the overall execution time. The gain may be small; in the configuration 3, it is around 50-70 milliseconds because of the total processing time of Handler C and Handler D. As a result, this configuration slightly provides enough gain to overcome the overhead. Sometimes, gain may not even compensate the overhead. On the other hand, a gain can be very appealing. For example, parallel executions in configuration 4, 5 and 6 provide good results due to processing

129

times of Handler A and Handler B. Their execution times are enough to improve performance considerably. The numerical values of the results are stated in Table 6-11.

Table 6-11 : The elapsed time for the service execution and the standard deviation of the performance benchmark in the multi-core system

Configuration number 1 2 3 4 5 6

Mean value (msec) 7192.9 7220.92 7164.98 4324.86 4279.37 4264.78

Standard deviation 42.97 56.68 57.75 49.66 29.92 36.96

There is a limit for the performance gain coming from the concurrency. We cannot shorten the total handler processing time more than the longest handler execution time. For example, all the handlers may not possibly be processed within the duration of time that is less than Handler A„s execution time even if all the handlers are processed concurrently.

A percentage of a gain completely depends on handler configuration. On the one hand, it can provide a fascinating performance with the execution of all the handlers in a parallel manner. On the other hand, it cannot even present a gain to compensate the overhead coming from the distribution of handlers.

Figure 6-8 : Standard deviations of the service execution times in the multi-core system 0 10 20 30 40 50 60 1 2 3 4 5 6 M ill isec o n d s Configuration

130

Figure 6-8 depicts the standard deviations of the handler configurations. The deviations are reasonable because the executions times vary between 4000 milliseconds and 7000 milliseconds; around 50 millisecond deviations are acceptable.

Figure 6-9 depicts the results from the multiprocessor system, explained in section 6.1.2. The pattern is similar to that observed in the multiprocessor system. DHArch sequential deployment which replicates the same sequence of the Apache Axis handler deployment has higher processing time than that from Apache Axis due to the overhead of the handler distribution. In general, the gain does not only compensate the overhead but it also shows very significant performance improvements.

Figure 6-9: The service execution times of the six handler configurations containing the five handlers in the multiprocessor system

Table 6-12 shows the numerical values of the results for the multiprocessor system. Figure 6-10 depicts the standard deviations of the execution times for the multiprocessor system. The deviations are little higher than those in the multi-core

131

system. This can happen either due to the system scheduling algorithm or because of the system load during the execution time.

Table 6-12: The elapsed time for the service execution and the standard deviation of the performance benchmark in the multiprocessor system

Configuration number 1 2 3 4 5 6

Mean value (msec) 4023.02 4052.07 4025.95 2261.08 2250.96 2171.53

Standard Deviation 83.49 90.52 92.56 86.66 97.11 86.22

Figure 6-10 : Standard deviations of the service execution times in the multiprocessor system

Figure 6-11 illustrates results from the executions of the handlers in a cluster that communicates with a Local Area Network. The features of the computers have been provided in section 6.1.2. The execution times get smaller due to faster computers. However, this does not change the behavior of the handler configurations. They follow the same patterns of the previous systems. The sequential execution of DHArch is executed slower than those from the remaining configurations. The numerical values of the results are shown in Table 6-13.

0 20 40 60 80 100 1 2 3 4 5 6 M ill isec o n d s Configuration

132

Figure 6-11 : The service execution times of the six handler configurations containing the five handlers in the cluster utilizing Local Area Network

The standard deviations, shown in Figure 6-12 are reasonable even if the tasks between handlers travel over the local network. The network is fast and consistent. The message transportation does not take too much time. When the results are compared with those from the previous systems, any side effect coming from the usage of LAN is not observed.

Table 6-13: The elapsed time for the service execution and the standard deviation of the performance benchmark in the cluster utilizing Local Area Network

Configuration number 1 2 3 4 5 6

Mean value (msec) 1717.08 1741.95 1712.22 1182.06 1150.55 1139.26

133

Figure 6-12: Standard deviations of the service execution times in the cluster utilizing Local Area Network

The results, shown in Figure 6-13 and Table 6-14, are from the single processor system, explained in section 6.1.2. In contrast to previous measurements, single processor system provides a different pattern. Thread scheduling becomes an issue. Since two handlers are heavily CPU-bound, the individual execution times of them are increasing when they are executed concurrently. Moreover, NaradaBrokering and Apache Axis in Apache Tomcat container use the same processor. This worsens the thread scheduling.

Table 6-14: The elapsed time for the service execution and the standard deviation of the performance benchmark in the single processor system

Configuration number 1 2 3 4 5 6

Mean value (msec) 1538.14 1661.73 1638.54 1558.9 1528.21 1488.67

Standard Deviation (msec) 56.32 58.29 54.86 73.82 85.90 86.80 0 10 20 30 40 50 1 2 3 4 5 6 M ill isec o n d s Configuration

134

Figure 6-13: The service execution times of the six handler configurations containing the five handlers in the single processor system

Figure 6-14 illustrates the standard deviations of the execution times from the single processor system. Depending of the system load, the results fluctuate more than those from the previous systems. However, they, ranging from 50 to 80, are reasonable where the execution times are more than 1500 milliseconds.

Figure 6-14: Standard deviations of the service execution times in the single processor system 0 20 40 60 80 100 1 2 3 4 5 6 M illi se co n d s Configuration

135

6.1.5 Summary

We perform experiments to benchmark the different handler configurations in various environments. The handler configurations are built for the comparison purposes. Although handler distribution introduces an overhead, we also observed promising gains when they are executed concurrently.

In document DISTRIBUTED HANDLER ARCHITECTURE (Page 143-152)