Réunion du Guide DB2 pour z/OS France Jeudi 10 octobre 2013
Tour Opus, Paris-La Défense
par Philippe Dubost, CA technologies
DB2 10 Tips
DB2 v10 tips and other performance topics
Agenda:
Save CPU by including extra columns to an unique index Is the length of your inline LOBs optimal ?
Impact of SELECT * versus SELECTing specific columns Improve the response time of Native SQL procedures in DB2 v10
DB2 v10 tips and other performance topics
Save CPU by including extra columns to an unique index
Why do we have indexes ?
Indexes can be used to enforce uniqueness of keys
Indexes are mainly created to improve SQL performance
What’s the cost of indexes ?
Indexes consume a lot of disk space (DASD)
DB2 v10 tips and other performance topics
Save CPU by including extra columns to an unique index
DB2 v10 new feature :
ALTER INDEX IX ADD INCLUDE (COL)
What is it good for ?
The new INCLUDE feature in DB2 v10 allows you to reduce the number of indexes, it means:
Cost savings in DASD Cost savings in CPU.
DB2 v10 tips and other performance topics
Save CPU by including extra columns to an unique index
DB2 v10 tips and other performance topics
DB2 v10 tips and other performance topics
DB2 v10 tips and other performance topics
Save CPU by including extra columns to an unique index
A note when adding several columns to an index
You have to add them one by one …
That’s honestly not a big deal, but maybe an nice DB2 enhancement to add in future versions could be:
ALTER INDEX IX1 ADD INCLUDE (COL3) ; ALTER INDEX IX1 ADD INCLUDE (COL4) ; ALTER INDEX IX1 ADD INCLUDE (COL5) ;
DB2 v10 tips and other performance topics
Save CPU by including extra columns to an unique index
From Theory to Practice
How to locate this scenario in your DB2 environment ? Query the system catalog to locate indexes whose keys overlap with the unique index !
An example next slide, when the unique index is composed of only 1 column.
DB2 v10 tips and other performance topics
Save CPU by including extra columns to an unique index
SELECT IX.CREATOR,IX.NAME, IX.TBCREATOR,IX.TBNAME, IX.UNIQUERULE,IX.COLCOUNT, KEYS.IXCREATOR,KEYS.IXNAME, KEYS.COLNAME,KEYS.COLNO, IX2.CREATOR,IX2.NAME, IX2.TBCREATOR,IX2.TBNAME, IX2.UNIQUERULE,IX2.COLCOUNT, KEYS2.IXCREATOR,KEYS2.IXNAME, KEYS2.COLNAME,KEYS2.COLNO FROM SYSIBM.SYSINDEXES IX, SYSIBM.SYSKEYS KEYS, SYSIBM.SYSINDEXES IX2, SYSIBM.SYSKEYS KEYS2 WHERE IX.UNIQUERULE = 'U'
AND IX.CREATOR = KEYS.IXCREATOR AND IX.NAME = KEYS.IXNAME AND IX.COLCOUNT = 1 AND IX.TBCREATOR = IX2.TBCREATOR AND IX.TBNAME = IX2.TBNAME AND IX2.CREATOR = KEYS2.IXCREATOR AND IX2.NAME = KEYS2.IXNAME AND IX2.UNIQUERULE = 'D' AND IX.TBCREATOR = IX2.TBCREATOR AND IX.TBNAME = IX2.TBNAME AND KEYS.COLNAME = KEYS2.COLNAME ;
SYSIBM.SYSINDEXES
SYSIBM.SYSKEYS
contains information about all indexes in the subsystem, CREATOR, NAME, TBCREATOR, TBNAME, uniqueness, #columns, …
contains information about all keys of all indexes in the subsystem, COLNAME, COLNO, … associated with a particular INDEX
DB2 v10 tips and other performance topics
Save CPU by including extra columns to an unique index
With the result of this SQL statement, you can
prepare your ALTER INDEX statements to INCLUDE
non-key columns to the unique indexes listed, and
get rid of the corresponding now-superfluous
indexes.
Save DASD ! Save CPU !
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
DB2 v10 new feature :
INLINE LOBs allow a portion of a LOB column to be stored in the base TableSpace.
What is it good for ?
improves the performance of applications accessing LOB data, (reduces the need to access the auxiliary LOB TableSpace)
enables the creation of expression-based indexes on the inline portion of a LOB column
(improves performance of searches through a LOB column)
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
COL1 COL2 COL3 LOB
0001 0002 0003 0004 0005 0006 0007 0008 AAAA AAAB AABB BBBB CCCC CCDD EEEE FFFF 2002 2002 2005 2008 2012 2013 2013 2013 startLOB… aa000aaa… bb000bbb… cc111 dd111ddd… ee111ee ff222fff… gg333
Base TableSpace auxiliary LOB TableSpace
Rest of the LOB …endoftheLOB …aaaaaaaa …bbb
…dddddddddd …fffff
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
COL1 COL2 COL3 LOB
0001 0002 0003 0004 0005 0006 0007 0008 AAAA AAAB AABB BBBB CCCC CCDD EEEE FFFF 2002 2002 2005 2008 2012 2013 2013 2013 startLOB… aa000aaa… bb000bbb… cc111 dd111ddd… ee111ee ff222fff… gg333
Base TableSpace auxiliary LOB TableSpace
Rest of the LOB …endoftheLOB …aaaaaaaa …bbb
…dddddddddd …fffff
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
COL1 COL2 COL3 LOB
0001 0002 0003 0004 0005 0006 0007 0008 AAAA AAAB AABB BBBB CCCC CCDD EEEE FFFF 2002 2002 2005 2008 2012 2013 2013 2013 startLOB… aa000aaa… bb000bbb… cc111 dd111ddd… ee111ee ff222fff… gg333
Base TableSpace auxiliary LOB TableSpace
Rest of the LOB …endoftheLOB …aaaaaaaa …bbb
…dddddddddd …fffff
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
COL1 COL2 COL3 LOB
0001 0002 0003 0004 0005 0006 0007 0008 AAAA AAAB AABB BBBB CCCC CCDD EEEE FFFF 2002 2002 2005 2008 2012 2013 2013 2013 startLOB… aa000aaa… bb000bbb… cc111 dd111ddd… ee111ee ff222fff… gg333
Base TableSpace auxiliary LOB TableSpace
Rest of the LOB …endoftheLOB …aaaaaaaa …bbb
…dddddddddd …fffff
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
So what should be the length for my INLINE Lobs ?
IBM DB2 limitations:
Minimum: 0 (i.e. LOB is not INLINE) Maximum: 32680
Within this range, this is up to the DBA to define the proper length.
The length is defined via DDL statements:
Table creation : CREATE TABLE TABLE1 (COL_LOB CLOB(1M) INLINE LENGTH 20000) ;
Table alter : ALTER TABLE TABLE1 ALTER COLUMN COL_LOB INLINE LENGTH 30000 ;
Caution: Reorg needed ! (2)
Tip : Larger page size can be beneficial when turning a LOB into Inline LOB (3)
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
The optimal length for INLINE Lobs depends on the
actual DATA, more precisely, the length of the LOBs
…
The question is almost philosophical, since the answer greatly depends on the LOB Lengths Distribution in the column, in other words, it depends on the type of data that are stored in the LOB. And the DBA creating the table does not necessarily know what it contains, nor what it will contain…
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
Various examples of LOB length distributions:
1st case: almost all LOBs data are small, limited in size,
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
Various examples of LOB length distributions:
2nd case: the LOBs lengths are equally distributed, from
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
Various examples of LOB length distributions:
3rd case: the LOBS lengths is are linearly distributed
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
Various examples of LOB length distributions:
4th case: the LOBs length follow a normal distribution
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
Various examples of LOB length distributions:
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
From Theory to Practice
Identify inline LOBs in your subsystem …
SELECT TBCREATOR,TBNAME,"NAME",LENGTH,LENGTH2,COLTYPE FROM "SYSIBM".SYSCOLUMNS WHERE ( COLTYPE = 'CLOB' OR COLTYPE = 'BLOB' OR COLTYPE = 'DBCLOB' ) AND LENGTH > 4 ;
“LENGTH” > 4 indicates that the LOB column uses the INLINE LOB technique, the actual Inline LOB Length is LENGTH-4
“LENGTH2” reflects the maximum length of the LOB column (inline + stored in Auxiliary)
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
From Theory to Practice
Visualize the length distribution of a LOB column…
The following SQL categorizes the various lengths of the LOBs into LOB length ranges
SELECT RANGE, COUNT(LENGTH) AS ROW_COUNT FROM ( SELECT LENGTH(PARSETREE) AS LENGTH, (CEIL(27670/100)*(1+CEIL( 100 * LENGTH(PARSETREE) / 27670))) AS RANGE FROM SYSIBM.SYSVIEWS) AS TABLE_WITH_RANGE GROUP BY RANGE ORDER BY RANGE ;
This example uses SYSIBM.SYSVIEWS that contains a LOB column
PARSETREE, a 1GB BLOB (1073741824) with Inline LENGTH 27674-4 =
27670
When exported to Excel, one can make a simple graph representation of the LOB length distribution and visualize if the INLINE LENGTH value is properly set (next slide).
DB2 v10 tips and other performance topics
DB2 v10 tips and other performance topics
Is the length of your inline LOBs optimal ?
Advantages & Benefits
The nice thing about this technique is that it is re-usable over time, meaning that the same query can be ran later, when the LOB data values have evolved.
That can help to make sure the INLINE LENGTH value is (still) optimal !
DB2 v10 tips and other performance topics
Impact of SELECT * versus SELECTing specific columns
Performance impact benchmark
a simple and dirty SELECT * … versus …
selecting only the specific columns you need
First note
the performance degradation is not visible if we are dealing with small tables (small number of rows). My tests with a 1000 rows table did not show any difference. But when I performed tests on a more realist size of table (1,000,000 rows), I did notice meaningful differences.
DB2 v10 tips and other performance topics
Impact of SELECT * versus SELECTing specific columns
Test environment
Presenting here the results of a performance test (CPU time, and Elapsed time) for a 1,000,000 rows table.
DB2 v10 New Function Mode (NFM) subsystem.
Using a fairly simple table, with 7 columns only (integer, dates, and timestamps)
DB2 v10 tips and other performance topics
Impact of SELECT * versus SELECTing specific columns Benchmark results
70% overhead of CPU time 17% overhead of Elapsed time
Other tests performed show similar results, overhead of SELECT * varies depending on:
the number of columns specified in the SELECT COL1, COL2, … the size of the table (number of rows).
DB2 v10 tips and other performance topics
Impact of SELECT * versus SELECTing specific columns
From Theory to Practice
How to identify embedded SELECT * in existing applications running against DB2 for z/OS ?
A few years ago, I used a product that examines embedded
SQL statement in Cobol programs: CA Plan Analyzer (it also
works for other programming languages).
Not to enter into too much details about this tool, there is a
rule (sort of trigger) called Expert Rule 0064 that will be
triggered if an embedded SELECT * SQL statement is discovered in the application / plan / package analyzed.
DB2 v10 tips and other performance topics
Impact of SELECT * versus SELECTing specific columns
In addition to the performance impact mentioned detailed above, CA Plan Analyzer also notifies the user with the
following recommendation (which makes a lot of sense, and IMHO another good reason why application developers
should not use SELECT * type of SQL statements in their application) :
This should be avoided because of problems that can be encountered when adding and removing columns from the underlying table(s). Your application program host variables will not correspond to the added/removed columns.
DB2 v10 tips and other performance topics
Impact of SELECT * versus SELECTing specific columns
Recommendations
If you are a DB2 application developer and you care about your applications performance : do not use
SELECT * statements !
If you are a DB2 administrator, you may want to share this presentation with your application developers, and / or look for products that can detect the use of embedded SELECT * statements running in your DB2 environment.
DB2 v10 tips and other performance topics
Improve the response time of Native SQL procedures in DB2 v10
DB2 v9 introduced a new type of stored procedures called Native SQL Procedures.
These procedures execute in DBM1 address space, and provide SQL performance improvements due to less cross-memory calls compared to external stored procedures.
Several improvements were done in this area in DB2 v10.
IBM however mentions that the response time improvement (up
to 20% improvement) can be achieved only if the existing
Native SQL Procedures are dropped and re-created. That is, if you had created Native SQL Procedures under DB2 version 9 and upgraded to DB2 version 10, you might want to
DB2 v10 tips and other performance topics
Improve the response time of Native SQL procedures in DB2 v10
From Theory to Practice
You moved to DB2 v10, and you need to locate the Native Stored Procedures that were created prior the migration. As I did not know when my subsystem moved to DB2 v10, I used a “trick” to discover this information:
As in every new DB2 version, the DB2 catalog contains additional tables that are created during the migration
process. I took one of them, SYSIBM.SYSAUTOALERTS, and queried SYSIBM.SYSTABLES to get the CREATEDTS value.
DB2 v10 tips and other performance topics
Improve the response time of Native SQL procedures in DB2 v10
SQL statement used to discover the date of
migration to DB2 v10 :
-- When was DB2 upgraded to v10 NFM ?
SELECT CREATEDTS
FROM SYSIBM.SYSTABLES WHERE NAME = 'SYSAUTOALERTS'
DB2 v10 tips and other performance topics
Improve the response time of Native SQL procedures in DB2 v10
From Theory to Practice
Listing all Native Stored Procedures created prior the upgrade to version 10 NFM
-- List of Native SQL Procedures to re-create
SELECT CREATEDBY,OWNER,NAME,ORIGIN,CREATEDTS
FROM SYSIBM.SYSROUTINES
WHERE ORIGIN = 'N'
AND CREATEDTS < ( SELECT CREATEDTS
FROM SYSIBM.SYSTABLES WHERE NAME = 'SYSAUTOALERTS'
AND CREATOR = 'SYSIBM' );
Stored procedures are listed in SYSIBM.SYSROUTINES, and the
column ORIGIN = 'N' indicates that we deals with a Native SQL
Procedure
DB2 v10 tips and other performance topics
Improve the response time of Native SQL procedures in DB2 v10
From Theory to Practice
With that, you have the list of Native SQL Procedures that you want to DROP / RECREATE.
Hopefully, you have the DDL for these objects stored in dataset, but nothing is less obvious. If not, you can use tools to generate the DDL statements from the information
in the catalog, in this example I used CA RC/Query for
DB2 for z/OS to locate a particular Native SQL Procedure and generate its DDL (next slide)
DB2 v10 tips and other performance topics
Improve the response time of Native SQL procedures in DB2 v10
Locate the Native Stored Procedure, and execute
the “DDL” command.
DB2 v10 tips and other performance topics
Improve the response time of Native SQL procedures in DB2 v10
DB2 v10 tips and other performance topics
Improve the response time of Native SQL procedures in DB2 v10
From Theory to Practice
Ones you have the DDL, all what’s needed is to update the SQL, add the DROP syntax and a couple of
COMMITs, and the job is done!
Benefits
Appendix
Appendix
(0) Additional indexes --> Higher insert/delete CPU time (approximately 30% for each index)
Ref. presentation: DB2 10 Performance Update (Susan Lawson)
(1) Potential negative impact when using INCLUDE columns
If the majority of SQL is using the columns of the unique index (not the one to include) – after adding an additional column, you now will have less index entries per page, i.e. more GETP requests, i.e. impact on performance.
(2) Reorg needed when implementing Inline LOBs
Add an Inline length or increase length : Advisory Status (AREO) Decrease length : Restrictive state (REORP)
(3) Larger page sizes can be beneficial when turning a LOB into Inline LOB
Some customers default everything to 4K pages. If you’ve got a LOB and most of the data fits into a smaller length than the maximum inline length (32680), which isn’t
uncommon, it might be worth considering making the LOB columns Inline and the Page size larger so you can get more rows on a page. Inlining LOB columns in 4K pages might reduce the number of rows on the page, forcing you to read more pages.
(4) (not tested) - Another way to find Native Stored Procedures create under DB2 v9
DB2 v10 tips and other performance topics
Presenter’s biography:
Contact information:
www.linkedin.com/in/dubost/ www.db2forz.blogspot.com E-mail: [email protected]A 9-year IT professional, Philippe Dubost is Product Manager at CA technologies. In this role, he is responsible for products planning and strategy, presenting and representing the products portfolio to customers and industry analysts, collecting customer requirements and transforming them into actionable Agile/Scrum stories in the engineering backlog.
DB2 10 Tips
and Other Performance Topics
par Philippe Dubost, CA technologies