Star Join Schema - DB2 Universal Database for iSeries performance basics

Chapter 2. DB2 Universal Database for iSeries performance basics

2.3 Star Join Schema

A star schema is a database model characterized by a large centralized table, called the fact table, surrounded by other highly normalized tables called dimension tables. A Star Join Schema query is a query over multiple tables of a star schema database with local selection specified on the dimension or dimensions and equi-join predicates between the fact table and the relevant dimension table or tables.

The attributes of a star schema database model usually include:

򐂰 A relatively large fact table that contains millions or billions of rows holding the measurable or additive “facts” such as sales type transactions or events

򐂰 Relatively small and highly normalized dimension tables that contain descriptive data about the “facts” (in the central fact table), such as customer or location information

򐂰 A central fact table that depends on the surrounding dimension tables using a parent/child relationship, with the fact table as the child and the dimension tables as the parent

If the dimension tables are further normalized, the results are dimensions that might have additional tables that support them. This is known as a “snowflake” schema or model.

Figure 2-10 shows the structure of a typical Star Join Schema.

Figure 2-10 Star Join Schema

Starting with i5/OS V5R3, the SQE automatically supports optimizing Star Join Schema queries. This new support does not require the use of QAQQINI options. Recognizing and optimizing a star schema query is now a normal part of handling any query request.

Fact Table

Support is provided with DB2 Universal Database for iSeries CQE in OS/400 V5R2 and with CQE and SQE in i5/OS V5R3. To enable Star Join Schema support via QAQQINI options file, perform one of the following operations prior to running the query:

INSERT INTO library/QAQQINI VALUES('STAR_JOIN', '*COST', NULL);

INSERT INTO library/QAQQINI VALUES('STAR_JOIN', '*FORCE', NULL);

In the first operation, *COST is the recommended and preferred parameter value, since each EVI is considered based on its potential to further minimize the I/O. In the second operation,

*FORCE is more aggressive, using a larger number of bitmaps with the CQE skip sequential access method, since all of the relevant EVIs are used for local selection, regardless of cost.

With SQE, the QAQQINI option STAR_JOIN is ignored. However, because a few queries might still be routed to CQE, users who have specified STAR_JOIN *COST in the past should continue to do so. That way CQE will not suffer and SQE will be unaffected. In addition, in the past, the FORCE_JOIN_ORDER option was specified in conjunction with the STAR_JOIN option. This option is no longer required for CQE and should be removed by customers who use it for star join queries. If this option is left in, the FORCE_ JOIN_ORDER is still honored by both CQE and SQE and might prevent the optimizer from generating optimal plans.

2.3.1 Queries in a Star Join Schema

Queries in a Star Join Schema typically include the following characteristics:

򐂰 Multiple tables participating in the query

򐂰 Local selection predicates on the dimension tables rather than the larger fact table

򐂰 Equi-join predicates between the dimension tables and the fact table used to locate and select the relevant fact table rows and to decode and describe the fact table data

The equi-join predicate between any one dimension table and the fact table might result in a very large number of fact table rows being selecting, while the intersection of the equi-join predicates of multiple dimension tables might result in a relatively small number of fact table rows being selected.

To achieve the optimal performance in joining the tables, you must consider these rules:

򐂰 For CQE, the join order of the query must be Fact_Table as the primary table, followed by the dimension tables. CQE identifies the fact table by determining the largest table in the query and that table is “pinned” in join position 1. Then the order of dimension tables (2 through n) is optimized.

򐂰 SQE also identifies the largest table in the query (that is, the fact table) but optimizes the join order of all the tables based on new strategies and methods. This might result in the fact table being placed somewhere other than the first join position. Since SQE can use new and different methods, it is advantageous to also create single key radix indexes on the foreign key columns of the fact table. This index is optimal if the fact table is being joined from a dimension table in join position 1.

Note: Single key EVIs created over the foreign key columns of the fact table are required for the CQE Optimizer to implement the star join techniques. These same EVIs are optimal for the new SQE Optimizer to implement its Star Join Schema techniques.

2.3.2 Restrictions and considerations

When working with Star Join Schema support, consider these restrictions:

򐂰 For both CQE and SQE, the only sort sequence supported is *HEX. If the query or job specifies a sort sequence other than *HEX, the Star Join Schema support is not used.

򐂰 Single key column EVIs must be created over the join columns of the fact table. When using the star schema support, the CQE Optimizer does not use radix indexes to create the dynamic bitmaps for the skip sequential processing on the fact table. When a large range of key values are selected, using EVIs to create the dynamic bitmaps is more efficient and faster.

򐂰 Specifying the QAQQINI parameter STAR_JOIN within an OS/400 V5R2 database environment causes SQL query requests to use CQE instead of SQE. In V5R2, SQE is not specifically enhanced for Star Join Schema queries.

In V5R3, SQE includes specific enhancements for optimizing and executing Star Join Schema queries, so the STAR_JOIN parameter is ignored by SQE. In V5R3, CQE continues to honor the STAR_JOIN parameter for those query requests that are optimized and executed by CQE. For example, a query that contains the LIKE operator does continue to be optimized and executed by CQE instead of SQE. To gain the benefit of the CQE star join support, the QAQQINI STAR_JOIN parameter is needed.

2.3.3 Lookahead Predicate Generation

The key feature of the new SQE support is referred to as Lookahead Predicate Generation (LPG). While the SQE LPG support looks similar to the older CQE support, it has some additional benefits and uses.

When a query in a Star Join Schema is optimized, a hash join is normally chosen as the method to join the fact table to the dimension tables. To accomplish this, the original query is broken into multiple parts or steps. For each dimension table, an internal query is executed to access the rows matching the local selection predicates, using the best available access method (scan or probe, with or without parallelism). The data required for the query is then used to build a hash table.

In addition to building the hash tables, the join key values of the selected dimension table rows are used to populate a list of distinct keys. This list represents all the join key values (that match the corresponding local selection) and is used to identify the join column values in the fact table.

After the hash table or tables and distinct key list or lists are built, the original query is rewritten. The distinct key list or lists are used to provide local selection on the fact table. In effect, it transfers the local selection from the dimension table or tables to the fact table. This transfer is referred to as Lookahead Predicate Generation.

For more information about the Star Join Schema and LPG, see the white paper Star Schema Join Support within DB2 UDB for iSeries - Version 3 on the Web at:

http://www-1.ibm.com/servers/enable/site/education/abstracts/16fa_abs.html

For more information about the SQE, see the IBM Redbook Preparing for and Tuning the V5R2 SQL Query Engine on DB2 Universal Database for iSeries, SG24-6598.

Part 2

Gathering, analyzing,

In document SQL Performance Diagnosis (Page 39-43)