• No results found

Index Combine

In document SQL Performance Explained (Page 182-193)

It's actually the most common question about indexing at all: is it better to have one index with all columns or one individual index for every column? The answer is quite simple in almost all cases: one index with multiple columns is better—that is, a concatenated or compound index. the section called “Concatenated Keys”

explains that in detail.

However, there is one case where a single index can't do a perfect job—no matter how the index definition is changed. That's the case when there are two or more

independent range conditions in the where clause.

Consider the following SQL:

SELECT first_name, last_name, date_of_birth

FROM employees

WHERE UPPER(last_name) < ? AND date_of_birth < ?

It's impossible to define a B-Tree index so that there isn't a filter predicate. This limitation is quite simple to

understand—bearing in mind that an index is essentially an ordered list.

For example, when indexing

UPPER(LAST_NAME) , DATE_OF_BIRTH (in that order), the list will start with A and end with Z. However, a record for the employee Brown will be rather close to the start of the list—regardless of the date of birth. The date of birth influences the list position only marginally—just if the same name appears twice. That means that scanning from the beginning of the list will need to filter for the date of birth. The date of birth doesn't narrow the scanned range.

The same is true when defining the index the other way around. The scan will start with the earlier dates but will need a filter to remove the names that don't match.

After all, the problem is as simple as that: a chain with one axis supports one range condition as an access predicate.

Supporting two range conditions as an access predicate would mean to scan a corner of a chessboard. However, a B-Tree index is a chain—there is no second axis.

There are two workarounds for this problem. The first is to define a concatenated index just as described above—accepting the filter predicate. This is the only case where the most selective first rule applies. Whatever condition is more selective should go first so that the scanned range is as small as possible.

The second approach is to borrow a feature from the data-warehousing world.

Data-warehousing is the mother of all ad-hoc queries. A few clicks allow querying any combination of search filters. It's impossible to tell which columns will be queried together. That makes indexing as described so far rather impossible.

There is, of course, a solution to that problem. That is, the bitmap index—a special purpose index type. However, bitmap indexes have limitations that make them truly special purpose indexes. One problem is that bitmap operations are very CPU bound. But the more limiting problem is their terrible insert/update/delete

scalability. In fact, concurrent write operations on bitmap indexes are almost impossible so that bitmap indexes are

hardly usable in an OLTP environment.

However, the key benefit of bitmap indexes is that merging of multiple indexes is rather easy. That is, multiple indexes can be used and combined for a single table access.

Important

The database can—in principle—use one B-Tree index only per table access.

However, a single SQL statement can access many tables—by means of joins and subqueries—hence, a single SQL statement can use many B-Tree indexes.

Bitmap indexes can be combined on the fly so that a single table access can use multiple indexes.

There are many databases, including the Oracle database, that use a hybrid solution between regular B-Tree indexes and

bitmap indexes. They can, despite any better option, transform the result of multiple B-Tree scans to a bitmaps and merge them on the fly. That means that the concurrency limitation of bitmap indexes is bypassed by making the overall operation even more CPU bound. After all, a B-Tree based bitmap merge is an option of last resort only.

Important

The limitations of bitmap indexes make them hardly usable in a OLTP environment.

Some databases use different techniques to combine multiple B-Tree indexes. The following table gives a short overview for different implementations:

DB2

DB2 supports multiple index access on LUW 9r7 (using a dynamic bitmap) and on zOS v10.

MySQL

MySQL has an index merge optimization starting with release 5.0.

Oracle

The Oracle database uses BITMAP CONVERSIONs to combine multiple indexes on the fly (introduced with 9i).

PostgreSQL

PostgreSQL uses bitmaps to combine multiple indexes since version 8.1.

SQL Server

SQL Server can use multiple indexes ("Index Intersect") starting with V7.0 using a hash algorithm.

It's often better to use a single B-Tree index instead of bitmap indexes or multiple B-Tree indexes—even in case of multiple independent range conditions.

The following execution plan shows the bitmap conversions caused by two individual B-Tree indexes for two independent range conditions.

| BITMAP CONVERSION FROM ROWIDS|

| | | 5 | SORT ORDER BY |

| | |* 6 | INDEX --- Predicate Information

(identified by operation id): --- 6

-access("DATE_OF_BIRTH"<:DOB)

filter("DATE_OF_BIRTH"<:DOB) 9 -access(UPPER("LAST_NAME")<:NAME) filter(UPPER("LAST_NAME")<:NAME)

Please note that the Rows column was removed in favor of the CPU cost

information.

The notable operations are the sorts that follow the index range scans and the bitmap operations. There are two

independent range scans; the result is sorted and converted to an in-memory bitmap index. Those in-memory indexes are then combined with the BITMAP AND operation and the result is converted back to get the ROWID s. The last step is to fetch the records from the table.

Another, very important, detail in the execution plan is the CPU cost—12% in that case.

In comparison to the plan above, a single index on UPPER(last_name) and DATE_OF_BIRTH will perform better and not cause any notable CPU cost:

---

---| Id ---| Operation ---| Name | Cost (%CPU)|

---

---| 0 ---| SELECT STATEMENT ---| | 3 (0)| | 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 3 (0)| |* 2 | INDEX RANGE SCAN | UNAME_DOB | 2 (0)| --- --- Predicate

Information (identified by operation id): 2

-access(UPPER("LAST_NAME")<:NAME AND "DATE_OF_BIRTH"<:DOB) filter("DATE_OF_BIRTH"<:DOB)

Please note that both plans were created with bind parameters. Hence, the

cardinality estimates are ballpark figures and the overall cost might be very different.

In document SQL Performance Explained (Page 182-193)

Related documents