Example 2 – Performance Issue - What Is A Performance Warehouse And Is It Worth It?

• Lets look at the system I mentioned earlier

• Only two critical things to monitor

• Number of applications executing (every 10 secs)

• Length of backup time

• Other monitors run hourly

• Every thing is running smoothly

• A new job goes in to delete historic data

• Intermittent performance issues

Example 2 – Performance Issue

• What is different today to any other day?

• When are the differences?

• Do they correlate?

• What are my options to look at the information?

• Excel or SQL on a database?

• Which is easier?

• Matching tables in Excel or SQL?

• Getting those Excel functions just right or SQL functions?

• Which is repeatable?

The repeatable issue is a key one. If a problem has happened once it will probably happen again. A performance data warehouse with the right views etc will make the repeat investigations very quick. Load the data and out comes the result.

Example 2 – Performance Issue

• I believe it is SQL

• Select * from <table>

where table_name = ‘<table name>’

And <date / time criteria>

• This gets me the data I want

• I can then look at the differences

• It is also repeatable

Example 2 – Performance Issue

• In this case it was a significant increase in page reorgs

• Did you know deletes can cause inserts to page reorg?

• It causes a problem for mass deletes

• I now know what it is, but is it every table in the database?

• Create the table on the next slide

Example 2 – Performance Issue

SNAP_TIME TIMESTAMP ROWS_WRITTEN BIGINT ROWS_READ BIGINT OVERFLOW_ACCESSES BIGINT TABLE_FILE_ID INTEGER TABLE_TYPE INTEGER PAGE_REORGS BIGINT

TABLE_NAME VARCHAR128 TABLE_SCHEMA VARCHAR128

This table will contain all the information I require to diagnose the issue with the page reorgs. There is also some extra data – but who knows if I will need it?

Example 2 – Performance Issue

• First thing is to get the range of snapshot timestamps I want to use.

• This gets the first one after 0900 and the first after 15:00

with early (starttime) as (select min(snap_time) from tabsnap where time(snap_time) > '09:00'),

late (endtime) as (select min(snap_time) from tabsnap where time(snap_time) > '15:00'),

with early (starttime) as

(select min(snap_time) from tabsnap where time(snap_time) > '09:00'), late (endtime) as

(select min(snap_time) from tabsnap where time(snap_time) > '15:00'),

firstsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap where snap_time = (select starttime from early)), lastsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap where snap_time = (select endtime from late))

select l.tabschema, l.tabname, l.pagereorgs, f.pagereorgs, l.pagereorgs -f.pagereorgs as diff

from lastsnaps l, firstsnaps f where l.tabschema = f.tabschema and l.tabname = f.tabname

;

Example 2 – Performance Issue

• Note that this is a continuation of the same statement.

• Then get the table information that I want.

• At this stage I am just looking at the overall day

• So I am putting the first snapshots into ‘firstsnaps’ and last into

‘lastsnaps’

firstsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap where snap_time = (select starttime from early)), lastsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs from tabsnap where snap_time = (select endtime from late))

with early (starttime) as

(select min(snap_time) from tabsnap where time(snap_time) > '09:00'), late (endtime) as

(select min(snap_time) from tabsnap where time(snap_time) > '15:00'),

firstsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap where snap_time = (select starttime from early)), lastsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap where snap_time = (select endtime from late))

select l.tabschema, l.tabname, l.pagereorgs, f.pagereorgs, l.pagereorgs -f.pagereorgs as diff

from lastsnaps l, firstsnaps f where l.tabschema = f.tabschema and l.tabname = f.tabname

;

Example 2 – Performance Issue

• Then select the actual information

• I could order it by the biggest difference to get an immediate picture of the worst tables

select l.tabschema, l.tabname, l.pagereorgs, f.pagereorgs, l.pagereorgs - f.pagereorgs as diff

from lastsnaps l, firstsnaps f where l.tabschema = f.tabschema and l.tabname = f.tabname

with early (starttime) as

(select min(snap_time) from tabsnap where time(snap_time) > '09:00'), late (endtime) as

(select min(snap_time) from tabsnap where time(snap_time) > '15:00'),

firstsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap where snap_time = (select starttime from early)), lastsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap where snap_time = (select endtime from late))

select l.tabschema, l.tabname, l.pagereorgs, f.pagereorgs, l.pagereorgs -f.pagereorgs as diff

from lastsnaps l, firstsnaps f where l.tabschema = f.tabschema and l.tabname = f.tabname

;

Example 2 – Performance Issue

with early (starttime) as (select min(snap_time) from tabsnap where time(snap_time) > '09:00'),

late (endtime) as (select min(snap_time) from tabsnap where time(snap_time) > '15:00'),

firstsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs

l.pagereorgs - f.pagereorgs as diff from lastsnaps l, firstsnaps f

where l.tabschema = f.tabschema and l.tabname = f.tabname

This is just the query in one go

with early (starttime) as

(select min(snap_time) from tabsnap where time(snap_time) > '09:00'), late (endtime) as

(select min(snap_time) from tabsnap where time(snap_time) > '15:00'),

firstsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap where snap_time = (select starttime from early)), lastsnaps (tabschema, tabname, snap_time, pagereorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap where snap_time = (select endtime from late))

select l.tabschema, l.tabname, l.pagereorgs, f.pagereorgs, l.pagereorgs -f.pagereorgs as diff

from lastsnaps l, firstsnaps f where l.tabschema = f.tabschema and l.tabname = f.tabname

;

Example 2 – Performance Issue

• That’s OK as far as it goes

• What if I want more time increments?

• Or to find the worst hour?

• That becomes more interesting

Example 2 – Performance Issue

• Once again, OLAP SQL to the rescue

• The principle I always use it to build it up slowly

• The more complex the piece of SQL, the more steps I will take

• Debugging a 40 line piece of SQL is tricky

Example 2 – Performance Issue

• Step 1: Get the timestamps

with wantedtimes (timeareas) as ( values (time('09:00'))

union all

select timeareas + 1 hour from wantedtimes where timeareas < time('15:00')

realtimes (approx_time, time_stamps) as (select timeareas, min(snap_time)

from wantedtimes w, tabsnap t

where w.timeareas < time(t.snap_time) group by timeareas)

Wantedtimes stores every hour from 0900 to 1500.

If I added 30 minutes I would get every half an hour.

Do not go more granular than your snapshots. So if you take snapshots every 30 minutes do not do this query for every 15 minutes

Realtimes stores that actual snapshot time for the first snapshot after the time in wantedtimes

Example 2 – Performance Issue

• Step 2: Get the data

snap_data (tabschema, tabname, snap_time, page_reorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap

where snap_time in (select time_stamps from realtimes) )

This gets all the snapshot data for the timestamps in realtimes

Example 2 – Performance Issue

• Step 2: Now get the data with increments

inc_data (tabschema, tabname, snap_time, page_reorgs, diff, totdiff) as

(select tabschema, tabname, snap_time, page_reorgs, page_reorgs - sum(page_reorgs) over

(partition by tabschema, tabname order by tabschema, tabname, snap_time rows between 1 preceding and 1 preceding),

page_reorgs - min(page_reorgs) over (partition by tabschema, tabname) from snap_data )

This now keeps the same data, but adds in the increment of page reorgs.

The partition by and order by make sure it is the same table

The rows between 1 preceding and 1 preceding get the data from the previous row – be careful of going unbounded as it can take a long time especially if it has to get every rows back first

Example 2 – Performance Issue

• That’s the base data

• I can make that a table or a view or just put it into statements

• Everything else comes from that

(tabschema, tabname, snap_time, pagereorgs, diff, tot_diff) as with wantedtimes (timeareas) as

( values (time('09:00')) union all

select timeareas + 1 hour from wantedtimes where timeareas < time('15:00')

realtimes (approx_time, time_stamps) as (select timeareas, min(snap_time)

from wantedtimes w, tabsnap t

where w.timeareas < time(t.snap_time) group by timeareas),

snap_data (tabschema, tabname, snap_time, page_reorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap

where snap_time in (select time_stamps from realtimes) ) select tabschema, tabname, snap_time, page_reorgs,

page_reorgs - sum(page_reorgs) over

(partition by tabschema, tabname order by tabschema, tabname, snap_time rows between 1 preceding and 1 preceding),

page_reorgs - min(page_reorgs) over (partition by tabschema, tabname) from snap_data

Example 2 – Performance Issue

• What’s the worst hour?

select snap_time, sum(diff) from inc_data

group by snap_time

Example 2 – Performance Issue

• What’s the worst table?

select tabschema, tabname, totdiff from inc_data

order by totdiff desc fetch first 1 row only

Example 2 – Performance Issue

• How bad were the tables in the worst hour?

worsthours (worst_snap_time, tdiff) as

(select snap_time, coalesce(sum(diff),0) as totdiff from inc_data

group by snap_time order by totdiff desc fetch first 1 row only )

select tabschema, tabname, worst_snap_time, page_reorgs, diff from inc_data, worsthours

where snap_time = worst_snap_time

This is a combination of the previous two examples.

Worsthours gets me the worsthour and then I get all the information for that hour

Example 2 – Performance Issue

• The worst 10 tables in the worst hour?

worsthours (worst_snap_time, tdiff) as

(select snap_time, coalesce(sum(diff),0) as totdiff from inc_data group by snap_time

order by totdiff desc fetch first 1 row only )

select tabschema, tabname, worst_snap_time, page_reorgs, diff from inc_data, worsthours

where snap_time = worst_snap_time and diff is not null order by diff desc

fetch first 10 rows only

with wantedtimes (timeareas) as ( values (time('09:00'))

union all

select timeareas + 1 hour from wantedtimes where timeareas < time('15:00')

realtimes (approx_time, time_stamps) as (select timeareas, min(snap_time)

from wantedtimes w, tabsnap t

where w.timeareas < time(t.snap_time) group by timeareas),

snap_data (tabschema, tabname, snap_time, page_reorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap

where snap_time in (select time_stamps from realtimes) ),

inc_data (tabschema, tabname, snap_time, page_reorgs, diff, totdiff) as (select tabschema, tabname, snap_time, page_reorgs,

page_reorgs - sum(page_reorgs) over

(partition by tabschema, tabname order by tabschema, tabname, snap_time rows between 1 preceding and 1 preceding),

i ( )

Example 2 – Performance Issue

• Was it important to look at the increments?

• Yes

• There was a batch job that did a lot of work on 1 table

• This had 40K + page reorgs

• 95%+ were before 7am – they weren’t the problem

• Top six tables were quickly highlighted as the key tables

with wantedtimes (timeareas) as ( values (time('09:00'))

union all

select timeareas + intervals from wantedtimes, intervals where timeareas < time('15:00') ), realtimes (approx_time, time_stamps) as (select timeareas, min(snap_time)

from wantedtimes w, tabsnap t

where w.timeareas < time(t.snap_time) group by timeareas),

snap_data (tabschema, tabname, snap_time, page_reorgs) as (select table_schema, table_name, snap_time, page_reorgs

from tabsnap where snap_time in (select time_stamps from realtimes) ), ltddata (tabschema, tabname, snap_time, page_reorgs, diff, fulldiff) as (select tabschema, tabname, snap_time, page_reorgs,

page_reorgs - sum(page_reorgs) over

(partition by tabschema, tabname order by tabschema, tabname, snap_time rows between 1 preceding and 1 preceding),

page_reorgs - min(page_reorgs) over (partition by tabschema, tabname) from snap_data),

worsthours (worst_snap_time, tdiff) as

(select snap_time, coalesce(sum(diff),0) as totdiff from ltddata group by snap_time

order by totdiff desc fetch first 1 row only )

select tabschema, tabname, worst_snap_time, page_reorgs, diff, fulldiff 55

AGENDA

• What is a performance warehouse?

• How do I justify a performance warehouse?

• What are the priorities?

• Examples of information

• What then?

In document What Is A Performance Warehouse And Is It Worth It? (Page 36-57)