• No results found

Chapter 5: SQL Server 2000 and Windows

5.1 SQL Server 2000 and CPU

5.1.4 Investigating CPU bottlenecks

The tools used to observe CPU bottlenecks are typically the System Monitor and the Task Manager. There are also a number of tools in the Windows 2000 Resource Kit. We will focus on using the System Monitor in this section, although the Processes and Performance tabs in the Task Manager are also quite useful. These are shown later in Figures 5.14 and 5.15 when we investigate memory. Note that Chapter 7 discusses the general use of the System Monitor. The System, Processor, and Process objects are a useful place to start and it's worth a look at some of their counters, as shown in Table 5.1.

Table 5.1: Selected Counters for the System, Processor, and Process Objects

CPU-Related Counters Explanation System: Processor Queue Length

The number of threads that need CPU time but have to wait. This counts only ready threads, not those being handled. This counter belongs to the system object, because there is only one queue even when there are multiple processors on the server.

Processor: %

Processor Time This is the percentage that a processor is busy. There is an instance of this counter for every processor on the server. The_Total instance can be used to display the value of total processor utilization system-wide.

Processor: % User Time

This is the percentage that a processor is busy in user mode. User mode means application code and subsystem code.

Processor: %

Privileged Time This is the percentage that a processor is busy in privileged mode. Privileged mode means operating system services. Process: %

Processor Time This is the percentage of CPU time that a process is busy. In Figure 5.2 the System Monitor is being used to monitor the following counters:

Figure 5.2: A busy processor

• Processor: % Processor Time

The counter Processor: % Processor Time is highlighted (in white). We can see that the processor appears to be 100 percent utilized. This in itself is not necessarily going to cause a bottleneck; however, we can see that the Processor Queue Length is quite high. It averages around six (note the scale factor of ten so it can be seen on the display) and peaks at around ten. To check the average and maximum, this counter was selected instead of the counter Processor: % Processor Time counter. This means that on average, six threads are waiting for the CPU; this is a clear indication that we have a processor bottleneck.

The lows and highs in the Processor Queue Length counter display are caused by the randomness that ready tasks are being generated. This is not uncommon. Queues usually appear when the processor is very busy, but they can appear when CPU utilization not high. This can happen if requests for the processor's time arrive randomly and if threads demand irregular amounts of time from the processor.

So what is causing the bottleneck? Is it one process or many processes? We can monitor the processor use of each process to get a feel for the answer. In Figure 5.3 the System Monitor is being used to monitor the Process: % Processor Time counter.

Figure 5.3: Monitoring processor time for individual processes

We have selected the Histogram display to make it easier to look at the processes using the processor. It is pretty clear that one process is monopolizing the processor. This is the

highlighted process and we can see that it is SQL Server. The only problem is that we do not know which SQL Server! We may have many instances of SQL Server running, and in each case the instance will be named sqlservr in the System Monitor. There are various approaches to finding out which instance is which. One approach I find useful is to create a System Monitor report showing the Process: % Processor Time counter and the Process: ID Process counter. This is shown in Figure 5.4.

Figure 5.4: Checking process ID for the SQL Server instance

Note that we can confirm that the instance sqlservr with process ID 1000 is using up the CPU. Another way (often easier) is to check the Processes tab in the Task Manager. This is shown in Figure 5.5.

Figure 5.5: The Task Manager processes tab

If we click on the CPU column heading, the display will be sorted with the process using most of the CPU displayed first. We can easily read off the process ID from the PID column. Whichever method we use to find the process ID, once we have obtained it we now need to translate it into a SQL Server instance. An easy way to do this is to connect to the SQL Server instance you suspect in the Query Analyzer and execute the following statement.

This will return the process ID. If it is not correct, connect to the next instance and check that. Most servers will not be running more than a few instances.

Once we have established the SQL Server instance that is monopolizing the processor, we need to further investigate why this is so, and, if it is not a database or application design problem, perhaps consider moving the instance of SQL Server 2000 onto its own server. If no process stands out in this display, this might be an indication that the processor is just too slow.

Can we drill down further into SQL Server 2000? We can look at the individual threads. In

Figure 5.6 the System Monitor is being used to monitor the Thread: % Processor Time counter for all the SQLSERVR process's threads. We can clearly see that one thread with thread instance number 26 is using most of the CPU.

Figure 5.6: A single SQL Server thread monopolizing the CPU

Compare this with Figure 5.7. Here we see that many SQL Server threads are running the CPU. So looking at the Thread: % Processor Time counter can be useful to help distinguish between the case of one busy connection versus many busy connections, but I find that at this point I really want to start using the SQL Profiler.

Chapter 7 discusses the SQL Profiler in detail. We wish to check for connections that are using a large proportion of the CPU and which SQL statements on those connections are using the most CPU.

For our requirement we can create a trace with the SQLServerProfiler-Standard template. The default events are sufficient, since they include the events that we need. We can choose to filter out low CPU use events, but we must be careful not to filter out information that might prove useful in our investigation. In Figure 5.8, a graphic SQL Profiler display is shown.

Figure 5.8: The SQL Profiler showing a single thread monopolizing the CPU

The data columns have been grouped by the data column CPU, and we can immediately see that although many queries are using between 10 and 20 milliseconds of CPU, one query is using nearly 62 seconds of CPU. We can see that the duration of this query is about 62 seconds also. In fact, virtually this entire query is CPU. The SQL Profiler identifies the query syntax, application name, and so on so we can easily identify the problem query in our application. We can then, of course, investigate the query plan using the Query Analyzer and hopefully improve it.

We could have saved the trace into a table and then searched the table for events taking, for example, greater than one second of CPU. In practice, I find myself taking this approach most of the time.

In Figure 5.9, many queries are using between 50 and 60 seconds of CPU. No one query stands out. If the queries have a duration, reads, writes, and a CPU use that is expected, then it may be that the queries are efficient. If the processor is constantly busy and there is a

significant queue, it may be the case that the CPU is just not powerful enough.