• No results found

Day 18 - CS6302 - selectivity and cost estimate.ppt

N/A
N/A
Protected

Academic year: 2020

Share "Day 18 - CS6302 - selectivity and cost estimate.ppt"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Selectivity and Cost Estimates in Query Optimization

Cost-based query optimization

:

Estimate and compare the costs of executing a

query using different execution strategies and

choose the strategy with the lowest cost estimate.

(Compare to heuristic query optimization)

Issues

Cost function

(2)

Selectivity and Cost Estimates in Query Optimization

Cost Components for Query Execution

1.

Access cost to secondary storage

2.

Storage cost

3.

Computation cost

4.

Memory usage cost

5.

Communication cost

Note: Different database systems may focus on

(3)

1. Access cost to secondary storage.

This is the cost of transferring (reading and writing) data

blocks between secondary disk storage and main memory

buffers.

This is also known as

disk I/O (input/output) cost.

The cost of searching

for records in a disk file depends on

the type of access structures on that file, such as

ordering, hashing, and primary or secondary indexes.

In addition, factors such as whether the file blocks are

allocated contiguously on the same disk cylinder or

scattered on the disk affect the access cost.

2. Disk storage cost.

This is the cost of storing on disk any intermediate files

(4)

3. Computation cost.

• This is the cost of performing in-memory operations on the records

within the data buffers during query execution.

• Such operations include searching for and sorting records, merging

records for a join or a sort operation, and performing computations on field values.

• This is also known as CPU (central processing unit) cost.

4. Memory usage cost.

• This is the cost pertaining to the number of main memory buffers

needed during query execution.

5. Communication cost.

• This is the cost of shipping the query and its results from the

database site to the site or terminal where the query originated.

• In distributed databases, it would also include the cost of transferring

(5)

Using Selectivity and Cost Estimates in Query Optimization

Catalog Information Used in Cost Functions

 Information about the size of a file

 number of records (tuples) (r),

 record size (R),

 number of blocks (b)

 blocking factor (bfr)

 Information about indexes and indexing attributes of a file

 Number of levels (x) of each multilevel index

 Number of first-level index blocks (bI1)

 Number of distinct values (d) of an attribute

 Selectivity (sl) of an attribute

 which is the fraction of records satisfying an equality

condition on the attribute.

 Selection cardinality (s) of an attribute. (s = sl * r)

Note:

For a key attribute, d = r, sl = 1/r and s = 1.

 For a nonkey attribute, by making an assumption that the d distinct values are

(6)

For example,

suppose that a company has 5 departments numbered 1 through 5, and 200 employees who are distributed among the departments as follows:

(1, 5), (2, 25), (3, 70), (4, 40), (5, 60).

In such cases, the optimizer can store a histogram that reflects the

distribution of employee records over different departments in a table with the two attributes (Dno, Selectivity), which would contain the following

values for our example:

(1, 0.025), (2, 0.125), (3, 0.35), (4, 0.2), (5, 0.3).

(7)

Selectivity and Cost Estimates in Query Optimization

Examples of Cost Functions for SELECT

S1. Linear search (brute force) approach

 C

S1a = b;

 For an equality condition on a key, C

S1a = (b/2) if the record

is found; otherwise CS1a = b.

S2. Binary search:

 C

S2 = log2b + ⎡(s/bfr) –1

 For an equality condition on a unique (key) attribute, C

S2

=log2b

S3. Using a primary index (S3a) or hash key (S3b) to

retrieve a single record

For a primary index, retrieve one disk block at each index level, plus one

disk block from the data file. Hence, the cost is one more disk block than the number of index levels:

 C

S3a = x + 1; CS3b = 1 for static or linear hashing;

 C

(8)

Using Selectivity and Cost Estimates in Query Optimization

Examples of Cost Functions for SELECT (contd.)

S4. Using an ordering index to retrieve multiple records:

 For the comparison condition on a key field with an ordering

index, CS4 = x + (b/2)

S5. Using a clustering index to retrieve multiple records:

 C

S5 = x + ┌ (s/bfr) ┐

S6. Using a secondary (B+-tree) index:

 For an equality comparison, C

S6a = x + s;

 For an comparison condition such as >, <, >=, or <=,

 C

(9)

Using Selectivity and Cost Estimates in Query Optimization

Examples of Cost Functions for SELECT (contd.)

S7. Conjunctive selection:

 Use either S1 or one of the methods S2 to S6 to solve.

 For the latter case, use one condition to retrieve the records

and then check in the memory buffer whether each retrieved record satisfies the remaining conditions in the conjunction.

S8. Conjunctive selection using a composite index:

(10)

Example of Using the Cost Functions

Suppose that the EMPLOYEE file has rE = 10,000 records stored in bE = 2000 disk blocks with blocking factor bfrE = 5 records/block

the following access paths:

1. A clustering index on Salary, with levels xSalary = 3 and average selection cardinality

sSalary = 20.

(This corresponds to a selectivity of slSalary = 0.002).

2. A secondary index on the key attribute Ssn, with xSsn = 4 (sSsn = 1, slSsn = 0.0001).

3. A secondary index on the nonkey attribute Dno, with xDno = 2 and

first-level index blocks bI1Dno = 4. There are dDno = 125 distinct values for Dno, so the selectivity of Dno is slDno = (1/dDno) = 0.008,

and the selection cardinality is sDno = (rE * slDno) = (rE/dDno) = 80.

4. A secondary index on Gender, with xGender = 1. There are dGender = 2 values for the gender attribute,

so the average selection cardinality is sSex = (rE/dGender) = 5000.

(11)

Using Selectivity and Cost Estimates in

Query Optimization (7)

Examples of Cost Functions for JOIN

Join selectivity (js)

js = | (R

C

S) | / | R x S | = | (R

C

S) | / (|R| * |S |)

If condition C does not exist, js = 1;

If no tuples from the relations satisfy condition C, js =

0;

Usually, 0 <= js <= 1;

Size of the result file after join operation

| (R

(12)

Using Selectivity and Cost Estimates in

Query Optimization (8)

Examples of Cost Functions for JOIN (contd.)

J1. Nested-loop join:

 C

J1 = bR + (bR*bS) + ((js* |R|* |S|)/bfrRS)

 (Use R for outer loop)

J2. Single-loop join (using an access structure to retrieve

the matching record(s))

 If an index exists for the join attribute B of S with index

levels xB, we can retrieve each record s in R and then use the index to retrieve all the matching records t from S that satisfy t[B] = s[A].

(13)

Using Selectivity and Cost Estimates in

Query Optimization (9)

Examples of Cost Functions for JOIN (contd.)

J2. Single-loop join (contd.)

 For a secondary index,

 C

J2a = bR + (|R| * (xB + sB)) + ((js* |R|* |S|)/bfrRS);

 For a clustering index,

 C

J2b = bR + (|R| * (xB + (sB/bfrB))) + ((js* |R|* |S|)/bfrRS);

 For a primary index,

 C

J2c = bR + (|R| * (xB + 1)) + ((js* |R|* |S|)/bfrRS);

 If a hash key exists for one of the two join attributes — B of S

 C

J2d = bR + (|R| * h) + ((js* |R|* |S|)/bfrRS);

J3. Sort-merge join:

 C

(14)

Using Selectivity and Cost Estimates in Query Optimization

Multiple Relation Queries and Join Ordering

 A query joining n relations will have n-1 join operations, and hence can have a large number of different join orders when we apply the algebraic transformation rules.

References

Related documents

This book presents different classes of designs, and demonstrates how SystemVerilog Assertions are used in the design process from requirements document, verification plan, design

Please refer to the “What’s New” section of the Online Application (www.sceonlineapp.com) for additions, deletions and other changes to solutions and eligibility requirements

• If a standard “2x”    or “Null”    attack modifier card was drawn from a specific modifier deck during the round, shuffle all the discards of that deck back into

In this work, mixtures of non-volatile ionic liquids were screened by COSMO-RS aiming to find mixtures with positive excess volumes that could present an increased CO 2 capture

Our goal is not simply to sell your house, but rather to help realize the best price obtainable for your property in the shortest period of time.. To help achieve this goal we

W orking in E mbedded S ystem The project of this paper is to make a wireless GSM connection [1] in between user and embedded system of home - made devices (Television,

In both perspectives, public policy analysis and health content, the regulation of SSB is urgently needed. An advocacy effort is needed to encourage the government to

Capital Enhanced Yield Fund 0% 75% 30%. Prime Value