The fastest way to retrieve the information related to missing indexes, as identified by the query optimizer, is to query the missing index DMVs. There are four DMVs associated with the missing index feature in SQL Server:
• sys.dm_db_missing_index_details – stores detailed information regarding indexes the optimizer would have used had they been available, such as columns that could have been used to resolve equality or inequality predicates, and suggested INCLUDE columns for covering a query.
• sys.dm_db_missing_index_columns – accepts an index_handle and returns a list of columns that would comprise the suggested index.
• sys.dm_db_missing_index_group_stats – returns summary information regarding the potential benefit of a "missing" index, based, for example, on the number of seeks and scans that would have benefited.
• sys.dm_db_missing_index_groups – a join view between _group_stats and _index_details.
These views, when joined together, can identify missing indexes and provide the cost reduction, estimated by the optimizer, if the index was created. The sys.dm_db_ missing_index_group_stats and sys.dm_db_missing_index_groups views, despite their names, do not actually contain groups of indexes; the groups (as of SQL Server 2008 R2) relate to only one missing index tracked in the system.
The information stored in these DMVs is certainly useful, but there are a number of limitations that you need to consider when basing your index choices on this data, including the following:
• the information contained in these DMVs is in volatile storage; meaning that it only exists in memory and doesn't exist beyond SQL service restarts, or changes to a database state like restoring the database, detaching the database, taking the database offline, or the database being closed by the AutoClose option.
• statistics are only stored for a maximum of 500 missing index groups.
• index key columns, specified by the equality and inequality column outputs of the sys.dm_db_missing_index_details and sys.dm_db_missing_index_ columns DMVs, are not ordered according to cardinality.
As previously discussed in this chapter, it is often (though not always) best to order the index key columns such that the most selective column is the first column in the index. This reduces the number of database pages that must be read by the database engine while traversing the index, in order to satisfy the query.
However, unlike the DTA recommendations, the missing index recommendations stored in the DMVs do not consider key column cardinality; in other words, they are not based on the data contained in the key columns that it is recommending be created. As such, it is necessary to perform additional manual analysis of the key column cardinality in order to arrive at the optimal index structure.
These limitations mean that this tool is best used to identify gaping holes in an indexing strategy, rather than as a fine-tuning tool. The recommended approach is to identify potentially useful indexes, listing first those that offer the biggest potential performance benefit according to the metrics stored in the sys.dm_missing_index_group_stats DMV.
For example, in the sys.dm_missing_index_group_stats DMV, the user_seeks and user_scans columns provide the number of seek and scan operations that would have benefited from a particular index recommendation. The avg_total_user_cost column provides the average reduction in query cost as a result of creating the index, and the avg_user_impact column provides the percent reduction in query cost, had the index existed.
Together, these columns can be used to generate an overall estimated performance improvement associated with a specific missing index in the database. There are several ways to calculate this estimated performance improvement, but the generally accepted formula, shown in Listing 5.9, was provided by kind permission of Bart Duncan, one of the members of the SQL Server product team at Microsoft, from his MSDN blog post, Are you using SQL's Missing Index DMVs? (http://blogs.msdn.com/b/bartd/ archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx).
SELECT migs.avg_total_user_cost * ( migs.avg_user_impact / 100.0 )
* ( migs.user_seeks + migs.user_scans ) AS improvement_measure ,
'CREATE INDEX [missing_index_'
+ CONVERT (VARCHAR, mig.index_group_handle) + '_' + CONVERT (VARCHAR, mid.index_handle) + '_'
+ LEFT(PARSENAME(mid.statement, 1), 32) + ']' + ' ON '
+ ' (' + ISNULL(mid.equality_columns, '')
+ CASE WHEN mid.equality_columns IS NOT NULL
AND mid.inequality_columns IS NOT NULL THEN ',' ELSE ''
END + ISNULL(mid.inequality_columns, '') + ')'
+ ISNULL(' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement , migs.* , mid.database_id , mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE migs.avg_total_user_cost * ( migs.avg_user_impact / 100.0 )
* ( migs.user_seeks + migs.user_scans ) > 10
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * ( migs.user_seeks + migs.user_scans ) DESC
Listing 5.9: Identifying missing indexes based on query cost benefit.
The calculated improvement_measure column provides the estimated improvement value of each index recommendation, based on the average total reduction in query cost that would result, the number of seek and scan operations that could be satisfied by the index, and the percentage benefit the index would provide to the queries being executed. This column makes it easier to focus on those indexers that offer the biggest cost benefit. When analyzing the output of this query, I focus on the indexes with an impact value higher than 50,000. I then analyze the recommendations carefully, since it's likely that there will be a degree of overlap among the recommended indexes, with several indexes differing only subtly in terms of their index key column definitions and column orders, or included column definitions and orders. It's also likely that I can derive similar performance benefit by modifying an existing index rather than creating a new one. As discussed previously in the Index Selection and Design section, every index should be tested, to ensure that it really is useful, before deploying it to production. The goal is to
create as few indexes as possible that will satisfy as many as possible of the most signif- icant queries that comprise the SQL Server workload.