Sampling and UPDATE STATISTICS - HP NonStop SQL/MX Query Guide

Analyzing the Possible Impact of Updating Statistics

Depending on the size of the table, updating statistics can take longer than you would like. Consider updating statistics during the hours when peak performance is not required.

If you want to preserve the existing query execution plan, be aware that updating statistics might cause the optimizer to choose a different plan. Usually, you can

improve performance by updating the statistics on a table to reflect the current status. The update statistics operations, however, might not improve performance, as

discussed next:

•

You can update statistics to perform a sampling of rows to determine the statistical information. Depending on your sample size, this procedure could take a long time. (That is, the larger the sample, the longer it might take.) Because this is a statistical sampling method, the statistics gathered are not exact. Inaccurate statistics (wrong by more than 10 percent) can adversely affect the plan. See Sampling and

UPDATE STATISTICS.

•

Because the UPDATE STATISTICS statement does not automatically recompile programs, the operation does not invalidate the dependent programs. If you want to take advantage of the new statistics, however, you must explicitly recompile the dependent programs.

Sampling and UPDATE STATISTICS

Use sampling to control the amount of time spent calculating statistics. If you do not specify sampling, statistics are collected by scanning the entire table. The optional SAMPLE clause provides several methods of producing a sample set, based on a ratio, to determine the histograms. The SQL/MX Reference Manual describes each sampling method in detail.

You might want to use sampling because of the amount of time required to update statistics on the entire table. Sampling techniques that do not perform full large table scans result in significant performance gain. You can use SAMPLE without specifying the number of rows. By default, NonStop SQL/MX samples two percent of the table up to a maximum of two million rows. You might want to explicitly specify a larger sample size to increase statistics accuracy or a smaller sample size to reduce the running time.

When you use the SAMPLE option with UPDATE STATISTICS, a temporary table is created. You can use the HIST_SCRATCH_VOL default attribute for SQL/MX and SQL/MP tables. For SQL/MX tables, NonStop SQL/MX creates a single partition or hash partitions across multiple volumes as specified in the HIST_SCRATCH_VOL default attribute. NonStop SQL/MX determines how many partitions are needed based on the sample set retrieved by the SAMPLE option. It randomly selects the volume or volumes to use from the list of volumes specified in the HIST_SCRATCH_VOL default attribute. NonStop SQL/MX creates only as many partitions as specified with

Keeping Statistics Current

HP NonStop SQL/MX Query Guide —640323-001 3 -6

Sampling and UPDATE STATISTICS

created in the default volume specified by the _DEFAULTS define. For SQL/MP tables, a single partition is created on the volume specified in the HIST_SCRATCH_VOL default attribute. If the default attribute is not used, the temporary table is created in the same volume as the primary partition of the table. Because the temporary table is used for data gathering and calculation purposes, you must have adequate disk space to accommodate it on your system. For more information about the

HIST_SCRATCH_VOL default attribute, see the SQL/MX Reference Manual. Starting with SQL/MX Release 2.3.2, the temporary table can be created as a partitioned table. When you use the USING SAMPLE TABLE WITH PARTITIONS clause, SQL/MX creates a temporary table that is partitioned the same way as the base table for which the statistics are collected. The partitioned temporary table can increase the speed of the UPDATE STATISTICS command significantly because both write and read operations on the table can occur simultaneously.

Use a partitioned temporary table when the UPDATE STATISTICS command takes several tens of minutes to complete—there might not be any gain when the command takes only a few minutes.

The performance with partitioned temporary tables is heavily dependent on the degree of parallelism in the operations that the UPDATE STATISTICS command uses to write to and read from the temporary table. The following steps are recommended for obtaining the maximum parallelism from SQL/MX:

•

Ensure that the number of partitions is a multiple of the number of CPUs in the system.

•

Distribute the data records evenly across all partitions.

•

Distribute the partitions evenly across all available disks.

•

Distribute the disks evenly across all available CPUs.

An uneven distribution might degrade the performance with partitioned temporary tables. You can resolve this problem, in some cases, by creating a temporary table and specifying it for the command to use. To do this, use the USING SAMPLE TABLE table-name clause. The temporary table must meet these criteria:

•

The column attributes must be the same as the base table—the number of columns, order, and data types must match.

•

There must not be any indexes or triggers.

•

There must be no constraints.

•

The table must be empty.

When partitioning the temporary table, avoid using the disks and CPUs that are heavily loaded and try to distribute the partitions evenly across the remaining disks and CPUs. When the USING SAMPLE TABLE clause is used, the HIST_SCRATCH_VOL default attribute will be ignored.

Keeping Statistics Current

HP NonStop SQL/MX Query Guide —640323-001 3 -7

In document HP NonStop SQL/MX Query Guide (Page 67-69)