Testing Software Performance and
3.5 AMDAHL’S LAW
Amdahl’s law originated from explorations four decades ago on how one could enhance the performance of a computer system by using multiple parallel processors.
Here it is adapted for evaluating the performance of a software system that consists of a series of subsystems such as a typical three-tier system composed of a Web server, an application server, and a database server.
Since we prefer to explore multiple factors one at a time, let’s assume that we are dealing with a software system that consists of two subsystems. We’ll use the elapsed time as the performance measure of the system so that it would apply to both OLTP and batch jobs. Amdahl’s law helps estimate how much speedup one can have for the entire system if a subsystem is made faster. We often come up with such ques-tions when we try to figure out how we can make a system run faster by improving some parts of it. So Amdahl’s law is very applicable for us to evaluate the potential improvement on the performance and scalability of a software system quantitatively by improving the performance of the subsystems.
Let’s say that we have a system that consists of subsystem 1 and subsystem 2. The time for the system to process a request is T1on subsystem 1 and T2on subsystem 2, so the total elapsed time on the overall system is T1þ T2. Let’s further assume that we can potentially improve the elapsed time on the second subsystem by n times, which is termed the enhancement for convenience in this book. So the performance gain (G) by making the second subsystem n times faster can be derived as follows:
G¼ T1þ T2 2 relative to the original total elapsed time of the entire system. For convenience, we call f the impact factor in this book. The lower the impact factor associated with a subsystem, the less the gain would be for the overall system when that subsystem is made faster.
3.5 AMDAHL’S LAW 97
When applying Amdahl’s law, it’s important to keep in mind that the impact factor f is calculated based on the original elapsed times on the subsystem and the overall system, respectively. We’ll present a case study to show how much gain can be expected for the overall system by making a subsystem 10 times faster (n¼ 10) by varying the impact factor f according to Equation (3.3).
B Case Study 3.8: Amdahl’s Law
Instead of showing a single data point with a specific enhancement (n) and a specific impact factor ( f ), Figure 3.17 shows how much gain (G) can be expected for the overall system if a subsystem can be made 10 times faster (n¼ 10) with different values for the impact factor ( f ).
It is not surprising that the larger the impact factor for the subsystem being enhanced, the larger the gain for the overall system. It is seen that with the fixed enhancement of 10 times for the subsystem, on the lower end of f¼ 10%, the gain for the overall system is 10% as well, while on the higher end of f¼ 90%, the gain for the overall system is about a factor of 5. Of course, for f¼ 1, namely, the subsystem is the overall system itself, the gain for the overall system is equal to the enhancement for the subsystem.
Although the implication from Amdahl’s law is so obvious, in reality, many people simply make bold statements that if they can make some subsystem twice as fast, then they can make the overall system twice as fast as well. Some design decisions are actually based on such obvious fantasies.
In the next section, I’ll show you some real-world performance and scalability enhancement examples, which might be hard to quantify with Amdahl’s law, but
Figure 3.17 Performance gain (G) for the overall system versus the impact factor ( f) for a subsystem with a potential enhancement of 10 times (n¼ 10), calculated with Equation (3.3).
they are really effective. Hopefully, you can apply some of the examples here to your product and see immediate improvements on the performance and scalability of your product.
3.6 SOFTWARE PERFORMANCE AND SCALABILITY FACTORS
To some extent, the performance and scalability of a software system might be one of the most mysterious aspects of software. That’s because there are so many factors that can affect the performance and scalability of a software system. Some of these factors include:
† Raw performance and scalability of the underlying hardware platform. Since software runs on hardware, hardware certainly is one of the most important factors that determine how fast software can run when some workloads are put on it. There are four categories of hardware factors: CPU, memory, storage, and network. Each category is characterized with different specs:
a. For CPU, those specs include the CPU architecture, CPU clock speed, number of CPUs, amount of cache on the processor, and the memory bus speed.
b. For memory, it’s not that complicated: most of the time, it’s as simple as how much memory is installed on the system.
c. For storage, one needs to know whether it’s internal or external. In addition to the total amount of storage space available, one also needs to know the number of I/O controllers and the number of ports on each controller as well as the number of physical disks. If a certain level of RAID is used, one needs to know how many disks were used to configure that RAID level. The amount of cache at the storage level is critical for helping boost I/O performance as well.
d. For networks, one needs to know the number of network cards and ports installed on the computer systems under test as well as the maximum band-width such as 100 Mbps or 1 Gbps. It’s also necessary to know if all servers are on the same subnet of a LAN or if they have to communicate with each other across a firewall or WAN.
† How hardware is configured. All hardware systems are designed for potentially maximizing the performance and scalability of the software applications that run on it. Some examples may include:
a. With Intel-based servers, you may want to check whether hyperthreading is enabled if applicable, because it may help boost the performance of your application by as much as 30%, as was shown in Case Study 1.1 in Chapter 1.
b. With local disks on your Microsoft Windows OS based servers, you may want to check whether Enable disk caching on the disk is checked, as it may speed up your I/O significantly. You can check this out by following Computer Managementj Device Manager j Disk drives j Disk device j
3.6 SOFTWARE PERFORMANCE AND SCALABILITY FACTORS 99
Propertiesj Policy. You can also enable disk caching on UNIX systems using the disk format command. But be careful not to erase all your data on the disks.
c. If you are using Microsoft Windows OS based servers and your application is network intensive, you may want to make sure that the network adapter media type is set to full duplex. You can check this out by following Computer Management j Device Manager j Network Adapters j Ethernet j Properties j Advanced j Speed & Duplex j . . . Full.
† Operating system platform. Given the same hardware, application software installed on different operating system platforms may exhibit different perform-ance and scalability characteristics.
† Database system platform. Given the same hardware and operating system, database-dependent enterprise applications may exhibit different performance and scalability characteristics with different database systems.
† How your database is configured, which is a whole category on its own and I will give some specific examples later in detail.
† Configuration settings of your software itself, for example, whether it’s single-threaded or multithreading capable, the number of threads configured if multithreading capable, caching implementation and enabling, and database connection pool settings.
† How your software product is designed and implemented. With given hardware and the available knobs for configuring and tuning hardware systems, this is the most critical factor for determining the performance of your software. One of the objectives of this book is to help you design and implement performance and scalability into your software product by adopting all well-known performance practices and following an effective performance and scalability testing methodology.
This seems to be a long list for software performance and scalability factors, but we only scratched the surface of it. We are not trying to address all software per-formance and scalability factors in this section. Instead, we simply attempt to raise the awareness that software performance and scalability are determined by numerous fac-tors instead of just one or two. It’s not uncommon to see orders of magnitude better performance from the start time to when everything gets settled down.
In the following sections, I’ll present a few case studies to help reinforce the notion of software performance and scalability factors. These case studies came from my experience with real products and are representative of many software products devel-oped in many organizations. Let’s begin with the hardware as one of the most import-ant software performance and scalability factors in the next section.
3.6.1 Hardware
Although it’s so obvious that the raw performance and scalability of hardware is very critical for the performance and scalability of a software system, it’s not uncommon to find out that a lot of preproduction testing efforts tend to use undersized hardware.
To demonstrate how important the hardware is for the performance and scalability of a software system, I’d like to share with you one of my real experiences with a customer in resolving critical performance escalations.
To help determine the cause of the customer’s performance problem, I installed the identical software stack on two different hardware setups, one consisting of two iden-tical computer servers both using RISC processors, and the other consisting of two identical computer servers both using Intel Xeon processors. Each RISC box had two 1.28-GHz processors, and each Xeon box had four 3.67-GHz processors. One box was used for the application server, and the other was used for the database server.
As shown in Figure 3.18, I had four test configurations based on all possible pair-ings of the application server to the database server: RISC-to-RISC, Xeon-to-RISC, RISC-to-Xeon, and Xeon-to-Xeon.
Without going into any more details than necessary, the performance of the same software application tested with each configuration is shown in Figure 3.19. It is seen that:
† With both the application server and the database server installed on the two identical RISC-based systems, the application exhibited an average throughput of 6.5 objects/second.
† With the application server installed on a Xeon-based system and the database server on a RISC-based system, the application exhibited an average throughput of 7 objects/second.
† With the application server installed on a RISC-based system and the database server on a Xeon-based system, the application exhibited an average throughput of 12 objects/second.
Figure 3.18 Four different test configurations for showing the dependence of software performance on hardware.
3.6 SOFTWARE PERFORMANCE AND SCALABILITY FACTORS 101
† With both the application server and the database server installed on the two identical Xeon-based systems, the application exhibited an average throughput of 35 objects/second.
Apparently, in this test case, the faster CPUs delivered better performance. Let’s use the total CPU power of a computer system to quantify this interesting performance enhancement from the slower CPUs to the faster CPUs. The total CPU power of a computer system is defined as the product of the number of CPUs and the CPU clock rate. It is interesting to note from Figure 3.20 that the throughput scaled
Figure 3.19 Performance of the same application with different hardware pairings for application server and database server. RISC/RISC, Xeon/RISC, RISC/Xeon, and Xeon/Xeon correspond to the four different configurations labeled in Figure 3.18.
Figure 3.20 The performance of an enterprise application scales linearly with the hardware CPU processing power.
almost linearly from RISC to Xeon: a CPU power ratio of 5.73 from RISC-to-RISC configuration to Xeon-to-Xeon configuration yielded a 5.38 times performance improvement, which could potentially cut a one-day job into a few hours.
Convinced by my carefully designed tests and quantitative test results, the customer happily upgraded his hardware and achieved the performance expectation based on his business requirements.
In the next section, I’ll demonstrate how the operating system can be a potential performance and scalability factor.
3.6.2 Operating System
Given the same hardware, the performance of the same application may vary, depend-ing on what operatdepend-ing system is installed to host the application. In this section, an example on Windows versus Linux in terms of application performance is presented.
Note that the purpose here is not to stir up a war about which platform performs better.
Instead, my purpose is to show how the performance of an enterprise application may depend on the operating system on which it is installed to run. Also, to make it a fair comparison, no OS-specific tunings applied to either platform: namely, the perform-ance numbers were taken out-of-the-box for each OS platform.
Because there were four identical Intel Xeon based servers with database involved, there were actually four different test configurations, as shown in Figure 3.21. The Windows 2003 Enterprise Server was installed on two systems and a specific flavor of Linux Enterprise Server was installed on the other two systems so that we had two Windows systems and two Linux systems.
Figure 3.21 Windows and Linux test configurations.
3.6 SOFTWARE PERFORMANCE AND SCALABILITY FACTORS 103
These four test configurations were:
1. Win/Win configuration with both the application server and the database server on the two separate Windows systems.
2. Linux/Win configuration with the application server on a Windows system and the database server on a Linux system.
3. Linux/Linux configuration with both the application server and the database server on the two separate Linux systems.
4. Win/Linux configuration with the application server on a Windows system and the database server on a Linux system.
The common factors for this comparison of Windows versus Linux include the following:
† Hardware. Four identical server systems with the following specs for each system: 4 Intel Xeon quad-core processors at 2.4 GHz and 16 GB RAM.
† Enterprise Application. Two different versions of the same application—one compiled for Windows, and the other for Linux.
† Database. Same Oracle 10g except that one was the Windows version and the other was the Linux version.
† Workload. Same workload driven from a same-batch job diver inserting objects into the database.
As we stated, the test tool inserts objects into the database through the application server APIs. The same test procedure was repeated on each test configuration with 12 threads concurrently inserting objects into the database. The results are summarized and shown in Figure 3.22.
Based on the test results shown in Figure 3.22, it is seen that:
† With both the application server and the database server on Windows, a through-put of 290 objects/second was achieved, while with both the application server and the database server on Linux, a throughput of 152 objects/second was achieved. This seems to indicate that Linux was about 50% slower than Windows with this specific example.
† With the application server on Linux and the database server on Windows and Linux, respectively, a same throughput of 152 objects/second was achieved, which seems to indicate that it’s the application server on Linux that caused the slow-down.
In order to understand why the application seemed to be slower on Linux than on Windows with this specific example, the system resource utilizations were examined for the test configurations of Win/Win and Linux/Linux. As shown in Figure 3.23, it seems that an unusually high kernel CPU utilization of 31% was observed on the
application server installed on Linux. This might indicate that multithreading of this application was implemented more efficiently on Windows than on Linux.
In order to verify whether it’s an issue of multithreading implementation with the application on Linux, the same test was repeated with only a single thread inserting objects into the databases on the two configurations of Win/Win and Linux/Linux, respectively. Interestingly, as shown in Figure 3.24, the same throughput of 97 objects/second was achieved on both the Windows and the Linux setups. Although it’s not 100% conclusive, it does indicate that multithreading implementation with this application might be less efficient on Linux with this specific example.
Figure 3.22 Multi-threading performance comparisons between Windows and Linux.
Figure 3.23 High portion of the kernel CPU utilization on the application server installed on Linux.
3.6 SOFTWARE PERFORMANCE AND SCALABILITY FACTORS 105
Using a similar application but with more complex application logic, a series of tests were run on both Windows and Linux platforms to insert about 190,000 objects into the database by varying the number of threads, respectively. The test results are shown in Figure 3.25. This time, comparable performance of the same application was obtained on Windows and Linux platforms.
As stated earlier in this section, the purpose of this example is not to show which operating system is more superior to the other. Instead, I’d like to caution that the
Figure 3.24 Same performance between Windows and Linux with the same application running in single thread mode.
Figure 3.25 Performance comparison of the same application between Windows and Linux platforms.
operating system is indeed an important performance and scalability factor even given the same hardware and the same application.
In the next section, I’ll show you another important performance and scalability factor for database-intensive enterprise applications.
3.6.3 Database Statistics
It’s well known that database performance strongly depends on the most up-to-date statistics for the database server query optimizer to decide on the optimal execution plans. This is especially true with Oraclew10g, which is extremely flexible for inter-vening externally on how the optimizer chooses optimal execution plans for frequently executed SQL queries.
Let’s first explain what optimizer statistics are. Database optimizer statistics basically are computed profiles for database objects such as tables and indexes. If such statistics are up-to-date and known to the query optimizer, the query optimizer is able to compute all possible execution plans for a query and select the least costly one for the execution of that query.
Very often, when a database-intensive application is found to be significantly slower than it used to be, simply updating the optimizer statistics for a Schema might immediately solve all the performance problems. Figure 3.26 shows the magic effect of database optimizer statistics on the performance of an application. For the same test, it is seen that the throughput was doubled after the Oracle 10g optimizer statistics were updated during the time interval between 22:06:17 and 22:14:28.
Figure 3.26 Throughput of the same test doubled after optimizer statistics were updated with Oracle 10g.
3.6 SOFTWARE PERFORMANCE AND SCALABILITY FACTORS 107
The next example is another powerful demonstration about how critical it is to be
The next example is another powerful demonstration about how critical it is to be