Database Performance Report Using PivotalVRP

(1)

Database Performance Report

Using PivotalVRP

Sample Report

Date: July 2013

Created by: Shahar Harap

(2)

Table of Content

1. Executive Summary ... 3

2. Overall resource consumption overview ... 4

2.1. Peak investigation ... 5

3. Skew Analysis ... 7

4. Top consumers using BI model ... 9

4.1. Top SQL by CPU Time ... 9

4.2. Top SQL by Total Writes ... 9

4.3. Top CPU Time usage by Total Reads ... 10

5. Top SQL ... 11

5.1. Top SQLs distributed by runtime ... 11

5.2. Top SQL texts sorted by Reads per Sec. ... 12

5.3. Top SQL texts sorted by Writes Per Sec. ... 13

6. Variance ... 14

7. Conclusions ... 16

(3)

1.

Executive Summary

These are the highlights of the performance investigation using the PivotalVRP product:

• Overall –The main goal of this report was to provide a complete view into the GP environment while sample queries were running.

• Resource usage observations

o A few CPU peaks reaching about 50%

o Recurring IO peaks lasting up to a couple of minutes -‐ mostly reads o We discovered a series of skews.

• Summary

Overall we saw that the queries running were not taking advantage of the full GP environment, since at no point did the DCA reach extreme resource usage peaks, CPU usage was mainly in the 15-‐30% range and I/O consumption was well within the DCA’s capability.

We did discover through the Variance module that although the IO consumption grew, after a few tweaks the runtime performance (which translates into real world QoS) was improved considerably.

We found out that the data distribution within the table is not optimal; this type of visibility can assist DBA’s to tweak environments and/or queries thereby getting more out of the

environment. Within MPP environments the overall performance is dictated by the weakest link.

The inclusion of PivotalVRP can also assist in migration since it can quickly point out where fine tuning is necessary.

• Our recommendation :

1. DBAs and R&D should further investigate the skew issues identified on nodes SDW2 and SDW3 from an application point of view (data distribution).

2. System admin should investigate SDW2 and SDW3 skew from HW/OS/network perspective. 3. More traffic can be executed on GP to allow better utilization. If resource queues are

applied then they can be configured with a higher concurrency limitation.

4. DBAs & R&D should take a closer look at the highlighted queries brought is sections 2.1 & 5 and tune them if possible as well as looking at the queries on section 6 (what’s new) and make sure they are legitimate queries.

5. Once all of the above are applied, the system should be monitored using PivotalVRP for an additional week and then compared using the Variance module to see the performance differences.

(4)

2.

Overall resource consumption overview

The following graph shows the overall CPU and IO consumption during (01/07/2013 – 04/07/2013)

We found some peaks in CPU activity as shown in the graph above

We will examine some of them in order to find trends in the DB

During the investigation of the graph we see that: • The CPU reached up to 54% CPU

• The I/O loads show a few peaks of Writes and one peak of Reads • We saw some stops in the database

The table below highlights resource loads

Load types

Results during work

period

Avg CPU 29%

Max CPU 54%

(5)

2.1.

Peak investigation

We investigated the peak by using the Performance BI and Playback Module

We zoomed in to find where the peak started, with PivotalVRP you have the ability to select the period of time you wish to investigate – based on the events, you can review periods of time ranging from single minutes to days.

We ran the Playback module over the selected time period in order to revisit and review the chain of

events.

Playback results

We discovered that a single select could have caused the peak. We dove in, in order to see its parameters.

(6)

In the window above we can see that the select ran about 58:50 min but had no CPU or IO

activity.

Below is the SQL text:

select computer_id,date(date_time), count(distinct(process_name)) process_count from

events group by computer_id, date(date_time) order by process_count desc limit

1000;

(7)

3.

Skew Analysis

We reviewed the information relating to skews with the Skew Analysis module and we suspect that on

the date selected there was an issue with the data distribution within the table. Our suspicion is due to the fact that a lot of skews where identified on various different segments, therefor the issue is not related to a particular node, see the example and table below:

(8)

host Avg Variance % Sdw3 266

Sdw3 233 Sdw2 100

In the example above and in the table we see that skews were found on different segment.

We see that the average variance reached up to 266% difference (in runtime) between the segments.

(9)

4.

Top consumers using BI model

We investigated CPU Time over a few days range in order to find what the top consumers SQL in the cluster are. Through the Performance BI module we have the ability to build the drill down path in a dynamic way by selecting the sequence of filters.

4.1.

Top SQL by CPU Time

4.2.

Top SQL by Total Writes

(10)

4.3.

Top CPU Time usage by Total Reads

(11)

5.

Top SQL

5.1.

Top SQLs distributed by runtime

Below are three of the top SQLtexts

! select date(date_time), source_port from events

! select distinct date(date_time), source_port from eventslimit 1000;

! select count(v)::float4 from (select ta.file_length as v, count(ta.file_length) as f from only pg_temp_33043.pg_analyze_18882_3 as ta group by ta.file_length) as foo where f > 1

(12)

5.2.

Top SQL texts sorted by Reads per Sec.

Here are the top 3 SQLs:

! select count(*)::float4 from (select ta.source_port from only

pg_temp_33043.pg_analyze_18960_3 as ta group by ta.source_port) as tb

! select count(*) from events_fix_1_prt_12;

! select computer_id, process_name,source_port, destination_port from events where

(13)

5.3.

Top SQL texts sorted by Writes Per Sec.

Below are the top 3 SQLs:

! Vacuum

! select sum(gp_statistics_estimate_reltuples_relpages_oid(c.oid))::float4[] from gp_dist_random('pg_class') c where c.oid=18986

! select sum(gp_statistics_estimate_reltuples_relpages_oid(c.oid))::float4[] from gp_dist_random('pg_class') c where c.oid=18908

(14)

6.

Variance

We know that some alterations were made and the original test run was halted and restarted, we wanted to see if there was a run time improvement between the two runs. The Variance module has the ability to provide numeric and quantitative performance comparative results.

From the screen above we can see that overall there was a 69.8% improvement of the CPU usage.

From the screen above we can see that all queries (100%) which ran during the two periods improved in

(15)

From the screen above we can see almost all queries which ran during the two periods increased their

I/O consumption.

What’s new

Above are queries which ran during the second period and didn’t run during the first one. In order to

get a full picture when comparing environments you need to be able to show what additional queries ran in a particular environment (or time frame), What’s New provides this information.

(16)

7.

Conclusions

The test environment selected for this sample report had multiple starting points due to a variety of reasons, PivotalVRP was used in order to give full visibility into the performance of the queries and the environment itself.

Thanks to its high level of granularity we have been able to analyze the performance data in a matter of minutes.

At any point during the selected period we were able to see how each single query behaved and what performance impact it had.

Overall we saw that the queries running were not taking advantage of the full GP environment, since at no point did the environment reach extreme resource usage peaks, CPU usage was mainly in the 15-‐30% range and I/O consumption was well within the environment’s capability.

We did discover through Variance that although the IO consumption grew, after a few tweaks the runtime performance (which translates into real world QoS) improved considerably.

We found out that the data distribution within the table wasn’t optimal, this type of visibility can assist DBA’s to tweak the environment and/or the queries thereby getting more out of the environment.

The inclusion of PivotalVRP can also assist in migration since it can quickly point out where fine tuning is necessary.