• No results found

Query Performance. USing the EXPLAIN Feature to Improve DB2~ How does DB2 proc!!5s an SOl query? 1I1TROOOCTIOlf

N/A
N/A
Protected

Academic year: 2021

Share "Query Performance. USing the EXPLAIN Feature to Improve DB2~ How does DB2 proc!!5s an SOl query? 1I1TROOOCTIOlf"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

USing the EXPLAIN Feature to Improve

DB2~

Query Performance

Frank Solcntan, Blue

Cross Blue Shield of CT

North Haven, CT 06(73

.lBSTR.lCT

This paper discusses the use of the SOL EXPLAIN

feature of DB2 to"assist in the understanding of bow DB2 btuldles an SOL query request and how to look for tuning opportunities for better query perforaenee. This paper will explain how DB2 processes an SOL request. how to use the SOL

EXPLAIN statement with the SAS8 SOL PASS-~ Facility. obtain the EXP~N results. and

interpret what those results maan. In addition. this paper will present some helpful ideas for optimizing queries to save both time and system overbead.

1I1TROOOCTIOlf

To acquire the skills to enhance SOL query

performance it is important to understand the underlying component structures that make up the 1m2 DBMS. 1m2 is a Database Management system.

comprising of two dimensional tables" and indexes that fora a basic relational structure. SOL

provides tbe means for data access and retrieval froa ~ tables. Internally. ~ contains tables that fora a catalog that store information about all DB2 objects. An intelligent component of DB2.

the optiaizer. deteraines the probable access path of 4 eingle SOL ototoment to tho data doto~inod

by lntoraatlon contain in the DB2 catalog. Tbe optimizer is an inference engine that makes all

~ data access decisions. The optimizer's

decision ean only be indirectly influenced through coding strategies in the SOL code. The SOL EXPLAIN is used to request that ~ explain the

optimizer's access path decision. Access path

per foraonce ae.y be .a. f foctod by:

- Number of rows to be accessed - Distribution of data

- Access strategy - Sequencing of data

- COndition of the databases - Requested approach to the data

Table INDEXES enhance access path performance by:

- Providing pointer controlled access to data - Avoiding scanning of all table data

- Avoiding sorts

- Avoiding access to the table data entirely

How

does

DB2

prOC!!5S

an

SOl

query?

The SOL statement that is presented to DB2.

whether dynamically (SAS. QMF) or through the BIND

process (application programs). must move through many internal DB2 components to locate and retrieve physical table data. The following gives a flow and description of the DB2 components used to satisfy an·SOL query request:

196

SOL Parser

- Checks for proper SOL syntax Optimizer

- Confirms the users authority - Decomposes query into query blocks

a query block represents a section of an SOL statement e.g .• base SELECl". a subquery. a UNICN

- Merges query with any DB2 table views being accessed in the query

- Obtains catalog statistics for DB2 objects accessed

- ChOOa811!11 an accolI!IIlI!II po th: - Sequences query blocks - Computes row filter factors - COmputes access path costs

- Determines whether to use an index - Determines which indexes to use - Chooses least costly path - Creates the access plan Code generator

- Converts access plan to object machine code

- Stores access plan as an Application Plan for statically bound applications

Executor

- Executes stored Applic~tion Plans to access data

(2)

- Execute dynaaic (Ad-Hoc) queries one time only

Data Retrieval Deta Manager-1M

- Stage 1 row fIlter. called sargable - Evaluates predicates (from WHERE

clause) as the data is retrieved from tbe physical storage (resource efficieut)

- Usee the Buffer Manager to access the tables

- Used by the Relational Data Syatem-RIlS - Relational Deta systea-RDS

- Stage 2 row filter. called non-sargable - Interface for queries (creates results

table)

- Makes calls to the Deta Manager - Evaluates predicates (from WHERE

c18use) after the data has been retrieved in the buffer (resource intensive)

- Buffer Manager

- Works like a scratch pad area for 082

- Inter feces between the VSAM Media Manager and the Deta M8nager VSAM Media Manager

- Manages the physical data storage

Bow

t.o

EXPUIH

Before any query can be explained. a pla~tablB

pref1xed by the User ID doing the explaining must be defined in the DB2 subsystem where the tables that are being accessed reside. 082 will insert rows representing the results of issuing an

SOL

EXPLAIN in the userid.PLAN-TABLE. The format of

the PLAlLTABLE is as followa:

CREATE TABLE USXRID. PLAl'LTABLE

( QllI!RYN:l INTEGER NOr NULL.

QBLOCI(NO SMALLINr NOr NULL. APPI.NAME 0i.\R( 8) NOr NULL. noGlIANE 0i.\R(8) NOr NULL. PLANro SM.\I.I. INr NOr NULL.

MEnt:lD SMALLINT NOr NULL. QlEATOR 0i.\R(8) NOr NULL. "INANE 0i.\R(18) NOr NULL.

TABNO SMALLINr NOr NULL. Acx:ESSIY?E 0i.\R(2) NOr NULL. MATOIalLS SMALLINr NOr NULL. Acx:ESSCREATOR 0i.\R(8) NOr NULL.

ACCESSNo\)(I! 0i.\R( 18) NOr NULL.

INIlEXDNLY 0i.\R(1) NOr NULL.

SORnLUNIQ 0IAR(1) NOr NULL. SCIR1N....TOIN 0IAR(l) NOr NULL. SORl>LQRDBIUIV 0IAR(1) NOr NULL. SORnLGllOOPIIY 0IAR(1) NOr NULL.

SClRl'C..UNIQ 0IAR(l) NOr WLL.

SORTC_30IN 0IAR(l) NOr NULL. SClRTC...0RDERB'i OfAR(l) NOr NULL. SClRl'C..GROUPIIY OfAR(l) NOr NULL.

TSLOClOClDE 0IAR(3) NOr NULL.

:TIlIESTAHP 0IAR(16) NOr NULL. RI!HARKS VAROIAR( 25.) NOr NULL.

PREPETOi CHAR(l) NOr NULL WIn! DEPAULT.

~ALOfAR(l) NOr NULL

wrm

DEPAULT. MIXlJPSEQ

VERSION CXlLLID

SMALLINr NOT NULL

wrm

DEPAULT. VAROIAR(6.) NOr NULL WIn! DEPAULT.

0IAR(18) NOr NULL

wrm

DEPAULT) IN DBNAHE. TSNAME

The follOWing is a simple coding semple to illustrate issuing an SOL EXPLAtN for a query SELECT using the SOL PASS-THROUGH Facility:

OPTIONS LINESIZE=250 PAGESIZE=60; PKlC SOL;

<XlNNECT TO DB2 (SSID=yourdb2.,.,.t_); EXl!Cl1l1! (EXPLAIN ALL SET IlUERYID = 1 l'OR

SELECT A .CXlLl .A.ax.. .B.OJLJ .B.CXlL4 PROM TABLEA A .TABLEB B WHERE A.ax.l IN (1)

AND A. OJL2 BETWEEN ? Alm ? AND B.CXlL3 = 1

AND A.COLI = B.COLI AND A.COL2 - B.CCL2 BY DB2;

\\IPIlT &SQLXMSG;

TITLE "002 Query Explain Resul tan: CREATE TABLE 'l'EXPLAIN It:> SELECT •

PROM <DINI!CTION TO DB2 (SELECT • PROM userld.PUl'LTABLE

WHERE OUERYID = 1);

DISCIJtINI!Cr PROM DB2; RUN;

PIlOC PRINr DATA=TEXPLAIN; RUN;

(3)

To explain a query. the ooly eleaent that is required is that the stata.ent EIPLAEH ALL SET OUBRYRJ .. 1 Pal IIlUSt prec:ed.e the SELEct clause. To issue an EXPLAIN. the EXECUTE state.ant

(enables DBMS specific SOL) must be used or an error will occur. The EXECUTE EXPLAIN statement can be eo.mented out when ready to actually execute the query. When executing these

statements. DB2 will insert rows representing the EXPLAIN data in the PLAlLTABLE. The query explain .owa will. be identified by the number that is used .in the SET QUERYlIl = D... so that this particular query can be easily 10ca~ when querying the PLAN...TABtE. Notice that -1- (parameter markers) are used in place of actual data. Using actual data may cause a different access patb than using -1- due to differing filter factors. Use M1- when the data is unknown or will vary to obtain the DB2 default factor. Use actual data if it is known or static to obtain the actual filter factor results;' It. filter factor of a Where predicate is a

percentage that estimates how aeny rows are rejected by and how many rows are not rejected by

~he WHERE predicate. Filter factors greatly aftect

~he estimated costs of the ~ access paths to be chosen by the optiaizer by estimating the

percentage.of the rows returned after evaluating

~be NtERE predicate. Use the &sg'''XI,s;t: aacro variable which contains the return code and error descriptions produced by DB2 to check for

successful execution of tbe SQL statements. Appendix It. shows an exaaple of sc.e explained output frca a PLAN-TABLE.

Vhat

do

the EXPLAIlJ columas aean?

'The PLAN...TABLE columns that relate to GENERAL

inforaation about the explained sgL are as follows:

QlII!R'iNO - OUery nWlber assigned by the explaining user

QB[.OCIQl:l - Number that identifios the SOL query sections for a query with a SELECT. subquery. \JNICIf

APPLNAM! - Plan naae for prograas with eabedded

SOL (C.COBOL.etc)

PRJGNAME - Prograa naae for programs with embedded SOL

PLANNO - identifies the order of operations

within a ~

aEATOR - Creator of a new table (materialized view} accessed

- N

... 01 naw table (1IIIIter1al1zed

View) accessed

TABID - N~ that identifies the ,sequence of references to the seae table in the PROM clause

TSI.OQ(MOI)E - The TABLI!SPACE lock _ e IS -Intent &bare IX -Intent exclusive

SIX -Share with intent exclusive S -Share

U -Update

x

-Exclusive

TIMESTAMP - Date and till8 the EXPI.AIN was processed

REMARKS - Pield in which comaents can be inserted

The PLAtLTA8LE colwms that relate to INDEX usage inforaation about the explained

SOL

are as follows:

ACCESSTYPE - Type of table INDEX usage: R -Pull table scan (uses no

index)

I -Use an index

11 -one-fetch scan (MIN or MAX functiOns)

N -Index scan (predicate uses an IN)

M -Multi-index scan followed by:

MX -Matching index scan on row pointers only MI -consolidate (intersect)

of row pointers from multiple indexes

MU -Collbination (union) of row pointers from aultiple indexes

MATCHCXlLS - NWlber 0 f index keys used in an index scan

AOCESSCREATOR - creator of the index being

used

ACC:BSSNAMB - Name of the index being used INCEXCNLY - Y if all data comes from the

index

N if data aust come from the table

- Sequence of steps in a multi-index scan

The PLANLTABLE columns that relate to SORT usage information about the explained SOL are as follows:

198

- Indicates the table 30IN method

o -First table accessed. first

(4)

SCRnLUNIO SCRnLJOIN

1 -Nested loop join

Z -Merge scan join

3 -Sort required for CJRDBR BY. GROUP B'i. UNION. DIS-riNer .. -Hybrid join

- Not being used (always N) Sort inner table for join processing (Y.N)

- Not being used (always N) Not being used (always N) Sort to remove duplicates

(Y.N)

SORTC-JOIN - Sort outer table for join processing (Y.N)

SORTC-~ - Sort for the ORDER BY clause (Y.N)

~CiRClJPBY Sort for the GFI!1JP BY clause

(Y.N)

PREFEICri Physical data read in advance: S -Sequential prefetch L -List pre fetch -Blank -No pre fetch

~AL - Indicates when a column fUDction is evaluated:

R -At detl!. retrieval tiae (stag8 1)

S -At eart time (stage 2)

Vhat

should

you

be

looking for?

Know what coluans aake up tbe indexes on the tables being accessed. Indexes enbance performance and reduce costs. Look lit the ACC'ESS'lYPE to see if an index is being used.. An Aa:ESS'IYPE of "R" means all tbe date. in the table .ust be scanned. and no

indexes are being used. For larger tables and joins this is very elQ)8nsive. Look for MATO«XlLS to eee bow llany index keye are being ueed. The aore the better. Clteck to see if INDEXt't«..y 1s Ny".

A ·Y· aeens that data viII be retrieved from the index rather than the ~able (very f~st). ~tchlng

on a unique index provides the best filtering of data. Change the WHERE clause to take advantage of indexes. Have indexes created to aatcb WHERE clauses that are comaon and often.

Avoid unnecessary sorts as auch as possible. DISTINCT, UNIW, ORDBR BY. GRlJUP JSi. and some

joins require DB2 to sort data, DB2 is sorting if

MI!'nI::[)

=

-3" and the sc:R'Ill or SORTC columns

=

ny". PREF.ETCH is good if lots of rows are being

accessed. Prefetch is when ~ fetches physical data into the buffer ahead of time before being analyzed. Improves perforaance of some types of quer18s by reducing tbe nuaber of physical 1/08

required to return large sets of data. Very effective when the table data is in clustered sequence. Could be a problem or degrade

performance if expecting accessing a small amount of data .

Table JOINS support multiple table access with matching and frequently make use of prlaary and

foreign keys. JOIN perforaance is contingent on indexes to support matching and WHERE predicates. When joining tables look at the ~ column. A nested loop join is preferred if accessing a small percentage of rows or when index coluans are used in the join. Nested Loop joins are favored by 002 when the appropriate indexes exist and the results table will be small. Merge scan join is preferred when accessing a large percentage of rows or index columns are not used in the join. Merge Scan joins may require sorting prior to joining. Hybrid joins make use of list pre fetch and process duplicate outer table matches aost efficiently. DB2 favors a Hybrid join for medium sized reSult tables. Joins are more efficient than subqueries. Joins make better use of indexes than subquries. Rewrite subqueries as joins when possible. Again. check the METHOD column to deteraine JOIN type. If using subqueries. use correlated subqueries when

possible. A correlated subquery refers to an outer query. DB2 will sometimes hold the results of a correlated subquery to evaluate against each row from the outer query.

Make predicates indexable whenever possible. Indexable predicates are usually evaluated at STAGE 1 filtering througb the DB2 Data Manager. STAGE 1 filtering 1s DOre efficient because data is filtered as the rows are being retrieved. The followina are indexable oredicates:

a::u:.tJMN wlue

0JI..1..I4N > V.dUB (also =). not») COLUMN ( v~lue (also =<, not<) aJLlJMN is NULL

CXJLI..Ir4N LIKE • cbar_'" aJLlJMN IN (list)

OJLUMN BETWEEN wluBl AND value2 the above connected by ~n AND

stage 2 predicates are usually less efficient because data 1s filtered or evaluated after the rows have been physically retrieved. STAGE 2 predicates are evaluated last and are the costliest to process. The following are STAGE 2 predicates :'

(5)

COLUMN = v.alue + 1 (any nu.eric calculation)

DAY(OJLlMiI)

=

1 (any sealar functions)

~lJICCLUMN2 = • val us' (concatenations) Rewrite predicates to take advantage of indexes and STAGE ! processing. Check ACCESSTYPE column.

SolIe

perforaance

tips.

The follOWing are some perfor.ance tips you .ight QODslder when constructing your SQL or to ehange EXPLAIN results to iJIlprove the retrieval of data.

• Make all queries as explicit as possible • The WHERE clause is a filter, use it

effectively

• Select only colu.ns that are needed. Do not use

=-• Use the IN statement instead of multiple ORa • Qualify all join columns in a JOIN. Join with

as .any of the index colUEl.s as possible • Do not use a JOIN without a qualifying WHERE

clause

• Do not specify tables in a JOIN if you don't Deed thea

• Avoid using arithmetic expressions in the WHEItE clause

• Avoid using character concatenations (II) in the WHERE clause

• Avoid using SUBSTRING in the WHERE clauSe • Use <= end => instead of NOr BETWEEN

• Avoid using NOT

=.

tor BEniEEN. tor IN. tDT LIKE • Do not use an ORDER fS{ wi th DISTINCT or UNICE if the order you want is the column order of the SELECT clause

• Avoid using a subquery in an IN list if the subquery returns .ultiple rows. Use a JOIN instead

• Use NOT EXISTS inetead of HOT IN for a eubquery

COITCI.USIOH

The key to a healthy query environment is ~he way in which we access and manipulate the data that is .ade available to us. Using the EXPLAIN feature of ~ can help in giving insight into the way De2 will handle the execution of the

SOL

query. Since the Del optimizer makes that decision for us. it is possible to help influence that decision by applying ve:ious techniques and re-formulating th6 sgL oode~ Understanding ~ and the processes that take place behind the scenes can greatly enhance ones perforaance and pra.ote a more efficient query envir::nment for 2111 to use. M2lka ita commor. pract1ce ~o use the EXPLAIN statement an4 develop

the skills to write perfor.ance _oriented query requests.

REliEREHCES

SAS Technical Report P-221. SAS/ACCESS Software: Changes and Enhaneements. Release 6.07. SAS Insti tute Inc.

IBM Database

2.

Application Programming and

SOL

Guide. version 2. IBM Inc,

IEM Database 2. Administ~ation Guide. Ve~sion 2. IBM Inc:.

(6)

QUERVNO QBLOCKNO APPLNAME PROGNAME PLANNO METHOD CREATOR TNAME TABNO ACCESSTVPE MATCHCOLS

---

---

--- --- ---

--.---

---

---1000 1 DSQ8ESQL 0 PROD

+l+g~gl~:gt::~~:~

M 0 1000 1 DSQ8ESQL 0 PROD MX 1 1000 1 DSQ8ESQL 0 PROD

+:+g~g:~:gt::~=:~

MX 1 1000 1 DSQ8ESQL 0 PROD MU 0

1000 1 OSQ8ESQL 0 PROD TITCP012_CLM_VR_2 MX 1

1000 1 DSQ8ESQL 0 PROD TITCPOI2_CLM_VR_2 MU 0

1000 2 DSQ8ESQL 0 PROD TITCPOI3_CLM_VR_3 M 0

1000 2 DSQ8ESQL 0 PROD TITCPOI3_CLM_VR_3 MX 1

1000 2 DSQ8ESQL 0 PROD TITCPOI3_CLM_VR_3 MX 1

1000 2 DSQ8ESQL 0 PROD TITCP013 CLM VR 3 MU 0

1000 2 DSQ8ESQL 0 PROD TITCPOI3:CLM:VR:3 MX 1

1000 2 DSQ8ESQL 0 PROD TITCPOI3_CLM_VR_3 MU 0

SORTN SORTN SORTN SORTN SORTC SORTC SORTC SORTC

ACCESSCREATOR ACCESSNAME INDEXONLV UNIQ JOIN OROER8V GROUPBV UNIQ JOIN ORDERBV GROUPBV TSLOCKMODE

---

---

---

--- ---

--- ---

---

~ N N N N N N N N N S E-< OBA T1ICPO 12_02 V N N N N N N N N S

....

DBA TIICPO 12_02 V N N N N N N N N N N N N S ~

....

~ N N N N N

~

a

DBA TIl CPO 12_02 V N N N N N N N N N

N N, N N N N N N N

N N' N N N N N N N S

DBA TIl CPO 13_02 V N N N N N N N N S

DBA TIl CPO 13_02 V N N N N N N N N S

N N N N N N N N N DBA T1ICPO 13_02 V N N N N N N N N N N N N N N N N N CI)UIMN FN

TIMESTAMP PREFETCH EVAL MIXOPSEQ COLLID

----._-

...

---

--

...

---

---

---1994031316265939 L 0 Q 1994031315265939 1 Q 1994031315265939 2 Q 1994031315266939 3 Q 199403131&266939 4 Q 199403131&266939 6 Q 1994031316266939 L 0 Q 1994031315266939 1 Q 1994031315286939 2 Q 1994031316266939 3 Q

References

Related documents