Aggregation Filtering - Local Sorting Algorithms

Chapter 4: Applications and Algorithms

4.2.3 Local Sorting Algorithms

4.3.2.2 Aggregation Filtering

There are several steps in a typical aggregation query where Active Disk filtering can be profitably applied. The first step is to remove the columns in a relation that are not relevant to a particular query result, for example, the address field of a customer record is not needed if we are totalling the total amount of money they owe, and the receipt date of particular order is not important if we are trying to determine how many of a given part have left our factory (in database terms, this is performing a projection on R to obtain R’). Figure 4-8 shows a few rows from the input table to Query 1. For this particular query, only 7 of the 16 columns in this table are required. In an Active Disk system, the rest of

the data never needs to leave the drive and consume interconnect bandwidth. Figure 4-9 gives the business question being answered by Query 1 as well as the SQL query text. From looking at the query text, we can determine that only a subset of the columns in the table are needed to answer this query, the comment and address fields, for example, are

l_okey|l_quantity| l_price|l_disc|l_tax|l_rf|l_ls|l_shipdate|l_commitdate|l_receiptdate|l_shipmode|l_comment ---+---+---+---+---+----+----+---+---+---+---+--- 1730| 6|11051.58| 0.02| 0|N |O |09-02-1998| 10-10-1998| 09-13-1998|TRUCK |wSRnnCx2 3713| 32|29600.32| 0.07| 0.03|N |O |09-02-1998| 06-11-1998| 09-28-1998|TRUCK |MOgnCO1 7010| 23|29356.28| 0.09| 0.06|N |O |09-02-1998| 08-01-1998| 09-14-1998|MAIL |jPNQlx3i 19876| 4| 6867.24| 0.09| 0.08|N |O |09-02-1998| 09-06-1998| 09-29-1998|AIR |3nRkNn4 24839| 8|12845.52| 0.05| 0.02|N |O |09-02-1998| 10-14-1998| 09-06-1998|REG AIR |jlw61g3 25217| 10| 18289.1| 0.05| 0.07|N |O |09-02-1998| 08-12-1998| 09-26-1998|TRUCK |SQ7xS5 29348| 29|41688.08| 0.05| 0.02|N |O |09-02-1998| 07-04-1998| 09-18-1998|FOB |C0NxhzM 32742| 8| 9281.92| 0.01| 0.03|N |O |09-02-1998| 07-17-1998| 09-19-1998|FOB |N3MO1C 36070| 31|34167.89| 0.04| 0|N |O |09-02-1998| 07-11-1998| 09-21-1998|REG AIR |k10wyR [...more...]

(600752 rows)

Figure 4-8 Format of the lineitem table, which is the largest in the TPC-D benchmark. The table serves as input to Query 1. Note that a few of the columns have been removed or shortened for presentation purposes. The full schema is provided in Appendix A and in the TPC-D benchmark specification [TPC98].

select l_returnflag, l_linestatus, sum(l_quantity), sum(l_price), sum(l_price*(1-l_disc)), sum(l_price*(1-l_disc)*(1+l_tax)), avg(l_quantity), avg(l_price), avg(l_disc), count(*) from lineitem where l_shipdate <= ’1998-09-02’ group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus

Figure 4-9 Business question and query text for Query 1 from TPC-D. The purpose of this query is to summarize about 95% of the item records in the database across price, discount, and quantity.

“Report the amount of business that was billed, shipped, and returned. Summarize for for all items shipped up to 90 days from the last date in the database and include total price, total discounted price, total price plus tax, average quantity, average price, and average discount, grouped by order status and return status.”

never used by this query. In fact, the subset of columns that remain after the initial scan of the table as it comes off the disk, provide a close to four times reduction in the amount of data moved from the disks to the host. The Scan Savings column of Table 4-4 shows how much data reduction is possible simply by eliminating the unecessary columns from the largest table in a subset of the TPC-D queries.

Note that the negative impact of pre-scan at the drives is that the full records are not returned to the client for later caching. In the case of small relations, this decrease in cache efficiency could well outweigh the benefits of Active Disk processing. This requires the query optimizer to choose plans where small relations (for which caching will likely be benefitial) are returned to the host in their entirety, while large relations (which would “blow out” the cache at any rate) are processed at the disks and marked “uncacheable” at the host. This also avoids the cache coherence problem that would arise if Active Disks were able to process pages that may be dirty in the host’s buffer pool. There are some additional concerns about locking, particularly in the presence of the UF1 and UF2 update functions in the TPC-D benchmark. These are important functions to consider in the design of an Active Disk database system, but there are a number of possible methods to minimize the impact of the updates on the decision support queries [Mohan92, Merchant92] and these issues are mentioned again in Section 7.5.2. The discussion that follows will assume that the relations being processed at the disks are uncacheable at the hosts, and that cursor stability is sufficient for the decision support queries.1

Figure 4-10 shows the entire plan for Query 1 as determined by the PostgreSQL optimizer, along with the amount of data reduction at each step in the query. We see that there is a factor of four reduction in data moved at the initial scan phase, a further 5% reduction in the qualification phase (the where clause) and then the final reduction of 5 orders of magnitude in the aggregation step, when everything is summarized down to only four result rows. The values shown in the figure are the size estimates made by the query

1. Note that this assumption, as well as the fact that our system has been in no way audited or approved by the normal TPC guidelines, means that are TPC-D results should be considered illustrative only of the types of benefits that might be possible in a fully audited and benchmarked system.

Query Input Data (KB) SeqScan Result (KB) Scan Savings (selectivity) Aggregate Result (bytes) Aggregate Savings (selectivity) Q1 126,440 34,687 3.6 240 147,997.9 Q4 29,272 86 340.4 80 1100.8 Q6 126,440 177 714.4 8 22,656.0

Table 4-4 Sizes and selectivities of several TPC-D queries. The table gives the data sizes and selectivities at intermediate stages of several TPC-D queries as executed by PostgreSQL. Note that since aggregation produces only one (or a few) values - a sum of values or a count of tuples for example - the selectivity is essentially infinite as the input size increases, i.e. it reduces an arbitrary-sized input to a fixed (and small) number of bytes.

optimizer, the actual sizes vary somewhat from this, as discussed in more detail in the Optimizer section of Chapter 6, but are the same order of magnitude.

The qualification condition does not provide a significant reduction in data moved for Query 1, but the conditions on Query 4 reduce the data to a tiny fraction of their origi- nal size. The Qualification Savings column of Table 4-4 shows the data reduction by qualification on the largest relation in the TPC-D queries. The Optimizer section in Chapter 6 provides additional details on how an Active Disk system could estimate the costs and benefits for a particular query, that can then be used to determine the appropriate partition- ing across disks and hosts.

Table 4-4 shows several queries from the TPC-D benchmark and the savings in data transfer if filtering is performed at different steps in the query execution. We see that the

SeqScan

where l_shipdate <= ’1998-09-02’ group by l_returnflag, l_linestatus sum(l_quantity), sum(l_price), sum(l_price*(1-l_disc)),

sum(l_price*(1-l_disc)*(1+l_tax)), avg(l_quantity), avg(l_price), avg(l_disc), count(*)

order by l_returnflag, l_linestatus

from lineitem

Sort Group

Aggregate

Sort

select l_returnflag, l_linestatus, Qual 35,189 -> 33,935 Scan 126,440 -> 35,189 Sort 33,935 -> 33,935 Group 33,935 -> 33,935 Aggr 33,935 -> 9 Sort 9 -> 9

Figure 1: Text, execution plan, and result for Query 1 from the TPC-D benchmark. The right column shows the text of the query, the center diagram shows the final plan chosen by the optimizer for this execution, and the left column shows the amount of data reduction at each node in the plan.

126,440 KB (15,805 pages) on disk

l_rf|l_ls|sum_qty|sum_base_price|sum_disc_price| sum_charge|avg_qty| price| disc| count ----+----+---+---+---+---+---+---+---+--- A |F |3773034| 5319329289.67| 5053976845.78| 5256336547.67| 25.509|35964.01|0.049|147907 N |F | 100245| 141459686.10| 134380852.77| 139710306.87| 25.625|36160.45|0.050| 3912 N |O |7464940|10518546073.97| 9992072944.46|10392414192.06| 25.541|35990.12|0.050|292262 R |F |3779140| 5328886172.98| 5062370635.93| 5265431221.82| 25.548|36025.46|0.050|147920 (4 rows) Query Text Query Plan Data Reduction Query Result

Figure 4-10 Text, execution plan, and result for Query 1 from the TPC-D benchmark. The right column shows the text of the query, the center diagram shows the final plan chosen by the optimizer for this execution, and the left column shows the amount of data reduction at each node in the plan. The query result is shown in the table at the bottom. This is the entire result, note the large data reduction, from 125 MB on disk to several hundred bytes in the final result.

benefits of the initial table scan are significant, and the benefits to performing the entire aggregation at the disks are several orders of magnitude.

4.3.3 Join

Highly selective joins will benefit significantly from the reduction in data transfer by operating directly at the Active Disks. Joins will also benefit from the greater amount of CPU power provided by the Active Disks in aggregate vs. that available at the host. A join combines data from two (or more) relations, so it is more complex than the select or project, which operate on only a single relation at a time. Figure 4-10 illustrates the basic computation of a join on two relations.

In document Active Disks - Remote Execution for Network-Attached Storage (Page 86-89)