• No results found

Retrieving a range of rows with multiple nonclustered indexes on the table

Chapter 3: Indexing 3.1 Introduction

3.12 Using indexes to retrieve data

3.12.7 Retrieving a range of rows with multiple nonclustered indexes on the table

At first glance, this query does not appear to be covered by the nonclustered index. It is a single column index on account_no. However, we know that the leaf-level pointer is the clustered index key, so the leaf-level index entry contains both the account_no column and the customer_no column. Therefore, the query can indeed be satisfied by the nonclustered index without the data rows being fetched, and the query is, in fact, covered.

The fact that the clustered index key is part of the index entry in a nonclustered index can result in the query optimizer choosing a very efficient strategy.

3.12.7 Retrieving a range of rows with multiple nonclustered indexes on the

table

Suppose we wished to execute the following query: SELECT * FROM accounts

WHERE

balance BETWEEN 100 AND 200 AND

customer_no BETWEEN 1000 AND 1200

If there are no appropriate indexes on the table, SQL Server would perform a table scan. If there is a nonclustered index present on the balance column, then the query optimizer might choose to use that index if the number of rows returned was not too large. If there is a nonclustered index present on the customer_no column, then the query optimizer might choose to use that index if the number of rows returned is not too large.

If one of the indexes is present and is chosen, then SQL Server would process the range retrieval by processing the appropriate range in index key values in the leaf level of the nonclustered index and issuing a data page request for each pointer (we'll assume there is no clustered index on the table, so we are dealing with Row IDs). When each data row is fetched, the remaining criteria would be applied to the data row. We say that it is filtered.

One problem with this technique is that it can be wasteful. Suppose we have a nonclustered index present on the balance column alone and that the query optimizer chooses that index to perform the previous query. The index may have 100 leaf-level index key values satisfying the balance range, and 100 data page requests (logical reads) will be performed. SQL Server will then apply the customer number range filter and could eliminate most of the data rows from the result set. We have used the nonclustered index to fetch a set of rows, most of which are ultimately discarded. Fetching data pages is a relatively expensive operation.

Now suppose we create a second nonclustered index on the customer_no column. The query optimizer can often make use of both of these indexes in the plan. The result of the query is the set intersection of the set of accounts that have a balance between 100 and 200 and the set of accounts that have a customer number between 1,000 and 1,200. This is shown in Figure 3.32.

Figure 3.32: Index intersection

From an indexing perspective we can think of this as the set intersection of the valid set of Row IDs from the nonclustered index on balance and the valid set of Row IDs from the nonclustered index on customer_no. As Figure 3.32 shows, the sets of Row IDs may overlap a little, overlap greatly, or not overlap at all. In the latter case, this means that no rows satisfy both criteria. The query optimizer can perform this set intersection in memory (typically) and so find the set of Row IDs that point to data rows satisfying both query conditions before the data pages have been accessed. This will often avoid having many data page requests

performed needlessly. How does SQL Server perform the set intersection operation on the Row IDs? It uses a hashing algorithm, which we will discuss in Chapter 4. In Chapter 4 we will also discuss a query optimizer plan, which utilizes index intersection.

So, typically how much benefit can this use of multiple indexes provide? This depends on a number of considerations, but the main one concerns the size of the reduction in the data page requests. Remember: If there are too many, the query optimizer will probably decide a table scan is a more efficient means of querying the data.

If we look at Figure 3.32, we can see that the intersection of the two sets of Row IDs in the second case results in a set that contains most of the Row IDs. In this case the number of data page requests will not be reduced greatly by the use of both indexes.

The intersection of the two sets of Row IDs in the first case results in a set that contains few of the Row IDs. In this case the number of data page requests will be reduced by the use of both indexes and this is a win.

In the third case the two sets of Row IDs do not intersect. This results in a set that contains no Row IDs. In this case the number of data page requests will be reduced to zero by the use of both indexes in the query plan, since clearly no rows satisfy the query. This is a big win.

We have just looked at a variety of scenarios using clustered and nonclustered indexes. In

Chapter 4 we will look more closely at the query optimizer itself and how these fundamental scenarios are used.