Skill 1.2: Design and implement indexes
2. Include Actual Execution Plan This represents the plan that is used, including
---4 26637
NULL 70510
Take a look at the plan of the following query that the system might perform, searching for CustomerTransactions rows where the PaymentMethodId = 4:
Click here to view code image SELECT *
FROM Sales.CustomerTransactions WHERE PaymentMethodID = 4;
This returns the expected 26637 rows, and has the actual plan shown in Figure 1-2. The Compute Scalar operator is there because we returned all columns, and there is a computed column in the table named IsFinalized.
FIGURE 1-2 The plan from the query for PaymentMethodId = 4
There are three ways using the GUI to get the query plan. From the Query Menu, select:
1. Display Estimated Plan This shows you the plan that is likely to be used to perform the query. The plan can change when the query is performed, due to many factors such as query load, memory available, etc. All row counts and costs are guesses based on the statistics of the index, and it does not require the query to be performed.
Whether or not parallelism can be used is determined during execution based on system settings (such as the sp_configure setting ‘cost threshold for parallelism’ and
‘max degree of parallelism’) and the load on the system at execution time.
2. Include Actual Execution Plan This represents the plan that is used, including
actual row counts, use of parallelism, etc. You get the actual plan after the query has completed in its entirety.
3. Include Live Query Statistics When you are working with a complex, long-running query, you can see data moving through the actual query plan operators live. It can help you diagnose issues with a large query by letting you see the problem spots in real time.
Note Textual plan Options
Additionally, there are several ways to get a textual plan when you need it.
Two examples are SET SHOWPLAN_TEXT to get the estimated plan, and SET STATISTICS PROFILE to get the actual query plan.
For now, we ignore the Missing Index listed in the plan (and for future cases edit them out until we get to the section on “Included Columns”), but the point here is that the index was not used. However, it is important to note that while an index is not generally useful, there are scenarios where the index actually turns out to be useful:
If the only column that was returned from the query was the PaymentMethodId since all of the data is there in the index, then it is useful.
An index is also useful when you are searching for a value that does not exist in the table. The statistics of the index do not tell the optimizer that no rows are returned from the query, only that very few are returned, so using the index should be fast enough. We review managing statistics in more detail in Chapter 4, but they are
basically structures that help the optimizer to guess how many rows are returned by a given query based on a sampling of the data at a given point in time.
These scenarios are why foreign key indexes are often applied to all foreign key columns, even if the indexes applied are not generally useful.
Need More Review? Deeper dive into indexing
Indexing is a complex topic that we only review in some of the primary scenarios. For more information, a great resource is “Expert Performance Indexes in SQL Server” from Apress by Grant Fritchey and Jason Strate (http://www.apress.com/9781484211199).
Indexing once data is in your tables
Although the indexing you might do to your tables before adding data is essentially part of the structure, and the rest of the indexes are strictly intended to improve performance. In this section, we cover several scenarios to consider when adding indexes to tables. Some of these scenarios crop up during development, even when you have very little data in your tables. Some do not show up until the data grows during performance testing or production
loads. All of Chapter 4 delves more into the ongoing tuning of your system, but for now we look at some common query types to tune, no matter how you discover the need.
Common search paths Joins
Sorting data
Unless you have very simplistic needs, it is hard to know exactly how queries behave in a real scenario, so in most cases it is better to test out your expectations rather than guess about performance.
Exam Tip
While tuning a real database should generally be done with real data, seeing real needs, this is not the case for the exam. The situations more likely follow a very deliberate pattern similar to the ones we discuss in the next sections.
The upcoming examples are not exhaustive as there are many different scenarios that can use an index to improve performance.
Common search paths discovered during development
The process of adding indexes starts during the development phase of the project. Even with smaller amounts of data in a table, there are given access paths that do not correspond exactly to the indexes the uniqueness constraints you have started with added. For example, in the WideWorldImporters database, in the Sales.Orders table, the
CustomerPurchaseOrderNumber is not a key value (there do exist duplicated values, because the purchase order number can be duplicated for different, or even the same customer in this design). During design, it was not a foreign key, nor was it a key in a uniqueness constraint. When the application was created, it included the following query:
Click here to view code image
SELECT CustomerID, OrderID, OrderDate, ExpectedDeliveryDate FROM Sales.Orders
WHERE CustomerPurchaseOrderNumber = '16374';
In the base set of rows here in the WideWorldImporters database (there are methods included for adding more data to give you more data to work with), the query runs very fast, returning just 6 rows. In order to see just how well it performs, you can use two
commands in Transact-SQL to see some very important statistics, that along with the query plan, give you the important information on how the query is operating.
Click here to view code image
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
SELECT CustomerID, OrderId, OrderDate, ExpectedDeliveryDate FROM Sales.Orders
WHERE CustomerPurchaseOrderNumber = '16374';
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
The plan returns what is shown in Figure 1-3.
FIGURE 1-3 Query plan that does not use an index
Along with the query results, there are a few additional messages. We are reducing to the pertinent ones here in our output, but you can see the compile and parse times, and the
overall execution time in addition to the following:
Click here to view code image
Table 'Orders'. Scan count 1, logical reads 692, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 20 ms.:
The query only takes around 20 milliseconds (ms) on a VM on a Surface Pro 4 with 8 GB of RAM for the VM (and 8GB for the host), scanning the table, touching all 692 pages of the table which has 73595 rows. All of the pages were in RAM already, so there are no physical reads (very common when testing individual queries and there is no memory pressure. You can clear the cache using DBCC DROPCLEANBUFFERS, but the most important number for indexing is the logical reads. Consistent readings of a large number of physical reads are more indicative of not enough RAM to cache data). However, if this is a table to which data is being actively written, scanning those 692 pages means that every single row is touched, and therefore locked in on-disk tables at some point in time, causing concurrency issues that are covered in more detail in Chapter 3, “Manage
Database Concurrency.”
Next, add an index to the Sales.Orders table on the CustomerPurchaseOrderNumber column, to attempt to speed the query:
Click here to view code image
CREATE INDEX CustomerPurchaseOrderNumber ON Sales.Orders(CustomerPurchaseOrderNumber);
Note Our sample database
The examples use tables from the WideWorldImporters database to review different types of indexing utilization. If you desire to try the queries yourself to make the same changes, make sure that you are working on your own copy of this database before making changes that affect other users.
Now, perform the same query on CustomerPurchaseOrderNumber = ‘16374’, and the following query plan is used, as shown in Figure 1-4.
FIGURE 1-4 Query plan after adding an index on the CustomerPurchaseOrderNumber
The query plan looks more complex. There is a join, even though the query uses a single table. SQL Server now uses the index-seek operation to find the six matching rows, but all it has are the CustomerID and the OrderID from the index keys. So it needs to use a JOIN operator to join to the clustered index to get the rest of the data. While the plan is more complex, the results are a lot better statistically, as you can see:
Click here to view code image
Table 'Orders'. Scan count 1, logical reads 20, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
It took only 20 logical reads, and less than 1 millisecond to perform. The reduction in 672 reads, means 672 less physical resources touched, and locked by default. As a result,
it is very useful to check out all of the queries that are used by your applications, either (ideally) from stored procedures, or as ad-hoc queries performed from your external interfaces.
Note that you can index a computed column as long as it is deterministic. You can tell if a column can be indexed, even if it is computed by using the COLUMNPROPERTYEX() function:
Click here to view code image
SELECT CONCAT(OBJECT_SCHEMA_NAME(object_id), '.', OBJECT_NAME(object_id)) AS TableName,
name AS ColumnName, COLUMNPROPERTYEX(object_id, name, 'IsIndexable') AS Indexable
FROM sys.columns WHERE is_computed = 1;
Search conditions are typically the most obvious to index because the affect people directly. When a user searches on an unindexed column in a large table (relative to
hardware capabilities), you may see locking, blocking, or using some settings (such as the database setting READ COMMITTED SNAPSHOT), high tempdb utilization. The needs are more random that in the following situation we will cover.
Joins
While simple index needs often manifest themselves as table scans, when joining data in two tables, the need for an index instead may show up as a different join operator than a nested-loops join. Nested loops work best when one set is very small, or the cost of seeking for a row in that set is inexpensive. It works by going row by row in one of the inputs, and seeking for a matching value in the other. When the cost of seeking in both sets is too costly, a Hash Match operator is used. This operator makes a pseudo hash index by segmenting values into buckets of values that can be easier to scan using a hash function. It does not need any order to the operation, so it can work to join two really large sets
together.
As an example, drop the foreign key index from the Sales.Orders table named FK_Sales_Orders_ContactPersonID using the following command:
Click here to view code image
DROP INDEX FK_Sales_Orders_ContactPersonID ON Sales.Orders;
Now, search for the Sales.Orders rows for any person with a preferred name of
‘Aakriti:’
Click here to view code image
SELECT OrderId, OrderDate, ExpectedDeliveryDate, People.FullName
FROM Sales.Orders
JOIN Application.People
ON People.PersonID = Orders.ContactPersonID WHERE People.PreferredName = 'Aakriti';
The PreferredName column is not indexed. Figure 1-5 shows the actual query plan, along with the typical query stats output.
FIGURE 1-5 Query plan and statistic output for unindexed foreign key index in join Figure 1-5 has the following output:
Click here to view code image
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 692, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'People'. Scan count 1, logical reads 80, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 15 ms, elapsed time = 53 ms.
Hovering your mouse over the Clustered Index Scan operator for the
PK_Application_People index (the clustered index on the table), you see (as depicted in Figure 1-6) the costs, but also that the predicate of PreferredName = ‘Aakriti’ is handled as part of this scan.
FIGURE 1-6 Operator costs for the Clustered Index Scan operator for the
PK_Application_People index
As you can see, the query optimizer scans the two indexes, and the Hash Match operator builds a hash index structure, and then matches the rows together. Adding back the index on the foreign key columns:
Click here to view code image
CREATE INDEX FK_Sales_Orders_ContactPersonID ON Sales.Orders --Note that USERDATA is a filegroup where the index was
originally
(ContactPersonID ASC ) ON USERDATA;
Executing the query again shows a better result, though not tremendously, as shown in Figure 1-7.
FIGURE 1-7 Query plan after adding back the foreign key index Figure 1-7 has the following output:
Click here to view code image
Table 'Orders'. Scan count 2, logical reads 695, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'People'. Scan count 1, logical reads 80, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 0 ms, elapsed time = 17 ms.
The big cost here is the Key Lookup operator to fetch the rest of the Sales.Orders columns in our query. This cost is what the missing index hint has been suggesting for nearly every query, and is the topic of the next section of this chapter. The query can be
improved upon one more time by indexing the PreferredName column, so the query
processor doesn’t have to test every single row in the Application.People table to see if it matches PreferredName = ‘Aakriti’.
Click here to view code image
CREATE INDEX PreferredName ON Application.People (PreferredName) ON USERDATA;
Finally, perform the query again to see the plan and statistics shown in Figure 1-8.
FIGURE 1-8 Query plan after adding index on Application.People.PreferredName Figure 1-8 has the following output:
Click here to view code image
Table 'Orders'. Scan count 2, logical reads 695, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'People'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob
logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 0 ms, elapsed time = 19 ms.
This is not a tremendous improvement, and is just 74 less accessed pages, and execution times are typically the same. Generally speaking though, the fewer pages read in the
process of executing the query, the better, particularly as the number of queries increase in an active system.
Note the Key Lookup operator that is 97 percent of the cost of this query. In a following section on included columns, we review how to erase that cost, and lower the logical reads to very few.
Sorts
The final query situation we look at is sorts. When you need to sort data, either for an ORDER BY clause or for some operation in the query where sorting data would make query operation quicker (the last join operator that we haven’t mentioned yet, called the Merge Join operator, requires sorted inputs to match rows from one large input set to another large set, in a quicker manner than using the Hash Merge algorithm previously mentioned).
Note Indexing and sorting
The examples in this section use only columns that show up in the operation in order to show how indexing and sorting work together, and it eliminates some of the costs of the bookmark lookup. The next section examines this
phenomenon in more detail.
A part of the CREATE INDEX statement we have not yet looked at is sorting of the index keys, particularly useful with composite index keys. By default, the data in the index is sorted in ascending order, so the indexes created so far have been ascending by default.
The query processor can scan the index in either direction, so for a simple index (one with a single key column), this is generally not a problem. For composite indexes (those with greater than a single key column) it can be an issue.
As an example, consider the following query of the entire Sales.Orders table, sorted in SalespersonPersonID and OrderDate order. Both are explicitly spelled out as ASC,
meaning ascending, which is the default. Note too that we only return the columns that are being sorted on to make the example simpler.
Click here to view code image
SELECT SalespersonPersonId, OrderDate FROM Sales.Orders
ORDER BY SalespersonPersonId ASC, OrderDate ASC;
Figure 1-9 shows the plan, which includes a scan through the data, a sort, and it even shows that the query used parallelism, since we’re running on a VM with 2 CPUs
allocated. In other words, this was not a trivial query.
FIGURE 1-9 Sorting results prior to adding index Figure 1-9 has the following output:
Click here to view code image
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 3, logical reads 758, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 94 ms, elapsed time = 367 ms.
Now, add an index to support this query, as we know that this query is performed very often in our enterprise. Add a composite index, and explicitly show that we are sorting the keys in ascending order, for the query:
Click here to view code image
CREATE INDEX SalespersonPersonID_OrderDate ON Sales.Orders (SalespersonPersonID ASC, OrderDate ASC);
Perform the query just as we did in the first attempt. Figure 1-10, shows that the plan has changed, as now it can get the data in a pre-sorted manner, with the primary cost of the query now embedded in the bookmark lookup.
FIGURE 1-10 Query plan after adding the index to the table Figure 1-10 has the following output:
Click here to view code image
Table 'Orders'. Scan count 1, logical reads 157, physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CPU time = 47 ms, elapsed time = 217 ms.
If the order you request is completely opposite of how the index is sorted, you will find
that nothing in the plan will change:
that nothing in the plan will change: