• No results found

Active Disk System

Chapter 6: Software Structure

6.4 Code Specialization

Since users have to rewrite their code to take advantage of Active Disks, this is an excellent opportunity to get them to write the code “the right way” and reduce some of the aliasing and code analysis problems that traditional software now suffers. The work of the COMPOSE group at IRISA and the Synthetix work at OGI have shown that code special- ization through partial evaluation can be a powerful tool [Massalin89, Pu95, Volanschi96, Consel98, Marlet99]. This type of specialization is particularly benefitial for operating system code and for small code “kernels” such as the functions to be executed on Active Disks. The work of Consel et al. makes the observation that 25% of operating system code is spent verifying arguments and parsing (or “interpreting”) system data structures. This work is often redundant across calls - traversing the same pointer chain in a ready queue, for example - and can easily be specificialized [Consel98]. Furthermore, this type of spe- cialization can also allow a particular piece of code to be optimized for the environment in which it runs - taking knowledge of the particular machine architecture available on the drive (cache size, processor type) or the number of memory buffers available into account. We know that once a particular piece of code is sent to the drive it will be executed many times, amortizing the specialization cost. This leverages the information the pro- grammer provides when creating an Active Disk function (“this is important, this is the core part of my application”) where a general-purpose system (running on a host for example) would have to first “discover” which particular pieces of code to specialize.

typedef struct database_aggr_param_s {

char* tuple_desc;/* format of the tuples on disk */ char* sort_keys;

char* aggr_expr; } database_aggr_param_t;

int database_setup_aggr(database_aggr_param_t params) void database_aggr(char* buffer, unsigned int len,

char* output,unsigned int *out_len, unsigned int max_len) void database_complete_aggr(void)

Such selective specialization should also be particularly successful for the core data- base functions that often traverse the same basic expression tree during the execution of a particular query. Database engines are basically “interpreters” from a query language, SQL, to the functions implementing tuple layout, memory management, and the core data- base functions.

The next two sections examine the possibilities for code specialization in the context of the PostgreSQL system and the TPC-D benchmark running on Active Disks.

6.4.1 Case Study - Database Aggregation

Table 6-4 shows the cost of executing Query 1 from the TPC-D benchmark using C code written specifically to handle that single query for the particular table schema using a file of test data from the benchmark. Data is read from a binary file of records, and the

entire processing of Query 1 is hand-coded into the program. No processing of schema descriptions or of the query text is done at runtime. We see that the instruction per byte cost is very low, giving a very high theoretical throughput on the prototype Active Disk. This should represent close to the fastest possible execution of this particular query, with the exact schema of the records on disk, the datatypes, as well as the query text, known at compile time.

Table 6-5 shows the same aggregation query as executed by the full PostgreSQL code. We see that the cost is much higher when the fully general code that handles arbi- trary datatypes and query texts, is used. This code is able to handle tables and records of an arbitrary schema, rather than being specific to one particular record format. This code

query type computation (instr/byte) throughput (MB/s) memory (KB) selectivity (factor) instructions (KB) Q1 aggregation 1.82 73.1 488 816 9.1 (4.7) Q13 hash-join 0.15 886.7 576 967,000 14.3 (10.5)

Table 6-4 Cost of Query 1 and Query 13 in direct C code implementation. The computation cost and memory requirements of a basic aggregation and hash-join implemented in C code specifically written to access the TPC-D tables from raw disk files. The last column also gives the total size of the code executed at the drives (and the total size of the code that is executed more than once).

operation computation (instr/byte) throughput (MB/s) selectivity (factor) scan 28 17.8 4.00 qualification 29 17.2 1.05 sort/group 71 7.0 1.00 sort/aggregate 196 2.5 3,770

Table 6-5 Cost of Query 1 using the full PostgreSQL code. The computation cost of the phases of computation to handle the query in a general database system.

also deals with an arbitrary expression for the Qualification, the Sort, and the Aggregate steps, where the C code is hard-coded for the particular qualification constants, sort keys, and aggregation expressions used in Query 1 of TPC-D.

These two sets of code represent the extremes of the spectrum, the fully general code from PostgreSQL that can handle any SQL query to the direct C code implementa- tion that can only perform a single hard-coded query. The insight of a code specialization and optimization system is to bridge the two order of magnitude performance gap between the two. Since the PostgreSQL code is repeatedly processing the same format tuples and evaluating the same conditions, it should be possible to generate more efficient, special- ized code for the execution of any given query.

Figure 6-9 shows the structure of the PostgreSQL code that is executed at the Active Disk. We see that a number of parameters and operators can be statically bound into the code at the time it is prepared for Active Disk execution. This means that the code on the Active Disks can take advantage of the knowledge of the tuple layouts, condition parame- ters, and operators to specialize for this particular query. By statically binding the tuple descriptions, the qualification expression and constants, and the subset of data type opera-

SeqScan

ExecQual Qual

HeapTuple ExprEval

FuncMgr

Figure 6-9 Active Disk processing of PostgreSQL Execute node. This diagram shows the change to the previous diagram necessary to support Active Disk processing. Note that the table schemas, expressions, and data type operators are statically bound in as part of the Active Disk function, this enables further optimizations to specialize the code that executes at the drives.

System Catalogs Access Method Heap Memory Page TupleDesc Heap Table Schema File Disk Page Statically Bound Data Type Operators Statically Bound Tuple Descriptions Statically Bound Expressions & Constants Active Disk Function

tors needed for a particular schema and query text, the runtime code specialization system can create code that is much closer to the hand-coded version in performance.

6.4.2 Case Study - Database Select

This section explores the types of specialization possible in the code for select in PostgreSQL as described in Chapter 5. Table 6-6 shows the amount of code executed for the thirteen most popular routines in the database select operation on a column of type date. We see that the largest single fraction of the time is spent copying data, but that the tuple processing and interpretation take over 50% of the code in just six routines. Much of the processing in these routines is repetitive and can be specialized away when the code is optimized for a particular query. By collapsing the general-purpose data type and expres- sion-parsing routines that operate on any data types into a single routine that only knows how to compare a single date column with a constant date value, the total number of instructions necessary would be greatly reduced.

Further study is needed to determine how much of these indivdual routines can be specialized, and what trade-offs the runtime system must make between the potential sav- ings and the overhead of performing the specialization, but the data presented in the last two sections shows that this is a promising direction for an Active Disk runtime system to take advantage of.

Table 6-6 Most frequently executed routines in PostgreSQL select. These thirteen routines account for close to 95% of the execution time at the drive.

Routine File Instructions Percent Description

memcpy libc 6,385,334 20.81 copy buffer

ExecEvalVar executor/execQual.c 4,598,000 14.98 evaluate column ExecMakeFunctionResult executor/execQual.c 3,572,000 11.64 result

ExecEvalExpr executor/execQual.c 3,534,000 11.52 expression ExecEvalFuncArgs executor/execQual.c 2,774,000 9.04 arguments fmgr_c utils/fmgr/fmgr.c 1,748,000 5.70 function dispatch

bzero libc 1,421,000 4.63 clear buffer

process_rawtuple executor/rawUtil.c 1,406,563 4.58

ExecQual executor/execQual.c 1,027,126 3.35 qualification ExecEvalOper executor/execQual.c 912,005 2.97 operator ExecQualClause executor/execQual.c 684,000 2.23

ExecStoreTuple executor/rawUtil.c 418,000 1.36