Reactive Index Design
LRT-LEVEL EXCEPTION MONITORING Averages per Program Are Not Sufficient
3. Sequential read (sequential prefetch)
We can deduce from the spike report which of these three methods was largely used by the promising culprit to read the table pages from disk.
Synchronous reads(type 1) do not overlap CPU time; the application program stops and waits for the arrival of the requested page from the disk server. The spike report shows the number of pages the transaction read synchronously together with the average wait time (the suspension time) for these synchronous reads.
Asynchronous reads (types 2 and 3) are prefetch reads; the I/O time does overlap the CPU time of the program; the DBMS asks for the next few pages from the disk server while the program is still processing pages in the database buffer pool. The program may or may not run out of pages to process. If it does, the wait time is recorded in the Wait for Prefetch counter. If the program never has to wait for prefetched pages, it is CPU bound. The Wait for Prefetch component in the spike report is then zero, and the elapsed time for the sequential scan is reported partly as SQL CPU Time and partly as Other Wait (for any CPU queuing).
Skip-sequential read (type 2) tends to be I/O bound with current hardware. Wait for Prefetch may be the largest component of the local response time if most of the skips are large, meaning that the pages containing the qualifying rows are far from each other.
Sequential read (type 3) is roughly balanced with current hardware; some- times the CPU time per page is a bit longer than the I/O time per page, sometimes a bit shorter. The QUBE assumes that the larger of the I/O and the CPU times is up to 0.01 ms per row. If sequential read is I/O bound, the elapsed time of a sequential scan is equal to the disk I/O time, which in turn is the sum of the CPU Time and the Wait for Prefetch components of the spike report.
Many programs access table pages in more than one of these three ways. Interpreting the spike report is then not so straightforward, but the dominant method is normally not too hard to deduce. The total number of table pages accessed may help. This contains, in addition to the synchronous reads, pages read by prefetch (both types 2 and 3) as well as buffer pool hits (the table pages found in the database buffer pool).
Many DBMSs are nowadays able to perform sequential processing in parallel, for instance, by breaking a cursor into several internal cursors, each using one processor and creating its own prefetch stream. If this option (CPU and I/O parallelism) is enabled, it is normally the optimizer that determines the degree of parallelism. If there are enough processors, disk drives, and buffer pool space, the degree may be perhaps 10. The elapsed time may then, theoretically, be almost as low as 10% of the nonparallel elapsed time. This option is invaluable when scanning massive tables and their indexes, data warehouse fact tables, for instance, but most installations with traditional operational transactions and batch jobs disable parallelism because a program that exploits several processors and drives at the same time may cause unacceptable queuing times for other concurrent users.
Tuning Potential
Before spending a great deal of time analyzing an exception transaction, it is worthwhile estimating by how much the local response time may be reduced as
a result of planned index improvements. The tuning potential is theupper limit for the achievable reduction.
Random Reads
The tuning potential of read only transactions is the product of the number of table pages read synchronously and the average duration of a synchronous read. The local response time will be reduced by this amount ifall table page reads are eliminated by means of fat indexes. Thus, in Figure 7.5, the tuning potential is 50,331×9 ms=453 s if the program is read only; programs with INSERT, UPDATE, or DELETE calls have a smaller tuning potential because the table pagesmust be updated no matter how fat the indexes.
Skip-Sequential Reads
Some optimizers would choose skip-sequential read in the previous example. The DBMS would then first collect the pointers from all qualifying index rows and then sort the pointers by table page number. Now the DBMS has a sorted list of all the table pages that contain at least one qualifying row; only now will the table be accessed. Naturally, the I/O time per page will be shorter, especially if several qualifying rows happen to be on the same track. In addition, the I/O time will overlap the CPU time because the DBMS is able to read ahead. The saving compared to synchronous reads is very variable; it depends on the average distance between the pages to be read and on the striping implementation; the tuning potential is simply the current value for Wait for Prefetch (assuming the program is read only and all prefetch is skip-sequential, e.g., list prefetch in DB2 for z/OS). Thus, the tuning potential in Figure 7.6 is 102 s.
LRT
SQL Non-SQL
CPU time Sync read Lock waits Other waits Synchronous read 50,342×× 9 ms = 453 s Wait for prefetch 455 s 455 s 0 s 2 s 453 s 0 s 0 s 0 s Indexes: Tables: 11 50,331 SQL calls 1003
LRT
SQL Non-SQL
CPU time Sync read
Lock waits
Other waits Pages read with list prefetch 31,836 102 s / 31,836 = 3 ms Wait for prefetch 104 s 104 s 0 s 2 s 0 s 102 s 0 s 0 s SQL calls 1003
Figure 7.6 Promising culprit—skip-sequential reads.
LRT
SQL Non-SQL
CPU time Sync read
Lock waits
Other waits Pages read with sequential prefetch 112,721 16 s / 112,721 = 0.14 ms Wait for prefetch 16 s 16 s 0 s 14 s 0 s 0 s 0 s 2 s SQL calls 1003
Figure 7.7 Promising culprit—sequential reads.
Sequential Reads
The spike report shown in Figure 7.7 shows an application that could be scanning an entire table (112,721 pages read using sequential prefetch). Because there is no I/O wait time, the program is CPU bound. The 2-s Other Waits figure might possibly be due to CPU queuing.
At first sight, the spike in Figure 7.7 may look like an unpromising culprit. It is easy to think of cases in which 1000 SQL calls take 16 s, even with good access paths; for instance, 1000 singleton SELECTs that cause, on average, two random reads. However, the biggest component is now CPU time, not I/O time.
This suggests that processing is mostly sequential. Sequential scans are often CPU bound. Random processing is CPU bound only if all the pages accessed are in memory. In addition, the report shows that about 100,000 pageswere read with sequential prefetch. These pages can be either leaf pages or table pages, or both. It’s possible, for instance, that a very thick index slice and a very thick table slice are read, the index and table rows being in the same order. Nevertheless, most rows are rejected. If we assume that a sequential touch takes roughly 10µs (this will be fully discussed in Chapter 15), the total number of touches must be roughly
14,000,000µs
10µs/touch =1,400,000 touches
assuming that the number of random touches is much lower than the number of sequential touches.
From this, it should be quite easy to find the SELECT(s) that may gener- ate massive sequential scans, more than 100,000 pages and probably more than 1,000,000 rows.
The tuning potential (assuming the 2-sother wait is caused by CPU queuing) is then almost 16 s. It should be possible to find an access path that reads much thinner slices.
If the slow access path contains a full index scan or a full table scan, would that not have been noticed already in the quick EXPLAIN review? Not neces- sarily. The access path may be using an index with one matching column for a range predicate. With the worst input the index slice to be scanned may, in fact, be the whole index. Figure 7.8 shows an example of such a case.
The optimizer may choose the index (IDATE, CNO) if it assumes a small filter factor for the predicate IDATE>:IDATE. As IDATE is theonly matching column (CNO cannot participate in the matching process because the first column
IDATE, CNO
INVOICE