• No results found

Whither Physical Design and Query Compilation

N/A
N/A
Protected

Academic year: 2021

Share "Whither Physical Design and Query Compilation"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

Whither Physical Design and Query Compilation

Grant Weddell

Cheriton School of CS

Spring 2012

(2)

The ACME

PAYROLL

System

A Case Study

ACME Corporation wishes to develop thePAYROLLsystem to more carefully

manageinformationabout its personnel, their salaries, etc.

Infrastructure forPAYROLL:

A computer with mass storage.

Department APS (short for applications).

Department DBA (short for database admininstration).

APS is responsible forPAYROLL.

DBA is responsible for the computing resources.

ACME will use a relational DBMS to implementPAYROLL.

(3)

ACME

PAYROLL

System: Information

There is a rough dichotomy of information in database systems into what are commonly termed data and metadata.

ExamplePAYROLLdata important to APS.

1 Mary is an employee.

2 Mary’s employee number is 3412.

3 Mary’s salary is 72000.

PAYROLLmetadata specified by APS.

4 There is a kind of entity called an employee.

5 There are attributes called employee-number, name and salary.

6 Each employee entity has attributes employee-number, name and

salary.

7 Employees are identified by their employee-number.

(4)

ACME

PAYROLL

System: Motivation

Reduces time needed by APS to createPAYROLL.

Reduces time needed for new APS personnel to learn about payroll.

(5)

ACME

PAYROLL

System: Architecture

Metadata is expressed in a data definition language (DDL).

Code/specifications that access and update data is expressed in a data manipulation language (DML).

(6)

ACME

PAYROLL

System: Physical Design

A physical design forPAYROLLselected by DBA.

8 There is a file of records called emp-file.

9 There are record fields emp-num, emp-name and emp-salary.

10 Each emp-file record has the fields emp-num, emp-name and

emp-salary.

11 File emp-file is organized as a B-tree data structure that supports an

emp-lookupoperation, given a value for attribute employee-number.

12 Records in file emp-file correspond one-to-one with employee

entities.

13 Record fields in file emp-file encode the corresponding attribute values

for employee entities, for example, emp-num encodes an employee-number.

(7)

ACME

PAYROLL

System: Queries and Query Plans

APAYROLLuser query specified by APS.

14 Find the salary for any employee whose employee-number is given

by a parameter p.

A query plan selected by a query compiler.

15 Invoke operation emp-lookup(p), just once, on file emp-file. If an

emp-filerecord is found, then extract and return the value of field

emp-salary.

The query is physically data independent. The plan is physically data dependent. The plan is executable.

(8)

Physical Design and Query Compilation

Overview

Σ SLoo Q Σ = (Σ[ Σ0[ Σ00) Σ00 (query compilation)  Σ0 SPoo Q0

4 There is a kind of entity called an employee.

5 There are attributes called employee-number, name and salary.

(9)

Physical Design and Query Compilation

Overview

Σ SLoo Q Σ = (Σ[ Σ0[ Σ00) Σ00 (query compilation)  Σ0 SPoo Q0

6 Each employee entity has attributes employee-number, name and

salary.

7 Employees are identified by their employee-number.

(10)

Physical Design and Query Compilation

Overview

Σ SLoo Q Σ = (Σ[ Σ0[ Σ00) Σ00 (query compilation)  Σ0 SPoo Q0

14 Find the salary for any employee whose employee-number is given

by a parameter p.

(11)

Physical Design and Query Compilation

Overview

Σ SLoo Q Σ = (Σ[ Σ0[ Σ00) Σ00 (query compilation)  Σ0 SPoo Q0

8 There is a file of records called emp-file.

9 There are record fields emp-num, emp-name and emp-salary.

(12)

Physical Design and Query Compilation

Overview

Σ SLoo Q Σ = (Σ[ Σ0[ Σ00) Σ00 (query compilation)  Σ0 SPoo Q0

10 Each emp-file record has the fields emp-num, emp-name and

emp-salary.

11 File emp-file is organized as a B-tree data structure that supports an

emp-lookupoperation, given a value for attribute employee-number.

(13)

Physical Design and Query Compilation

Overview

Σ SLoo Q Σ = (Σ[ Σ0[ Σ00) Σ00 (query compilation)  Σ0 SPoo Q0

15 Invoke operation emp-lookup(p), just once, on file emp-file. If an

emp-filerecord is found, then extract and return the value of field

emp-salary.

(14)

Physical Design and Query Compilation

Overview

Σ SLoo Q Σ = (Σ[ Σ0[ Σ00) Σ00 (query compilation)  Σ0 SPoo Q0

12 Records in file emp-file correspond one-to-one with employee

entities.

13 Record fields in file emp-file encode the corresponding attribute values

for employee entities, for example, emp-num encodes an employee-number.

(15)

Logical and Physical Design

A logical design is given by a pairhSL; Σi:

1 a logical vocabulary S

L, and

2 a set of logical constraints Σ over S

L.

A physical design is also given by a pair

hSL[ SP; Σ = (Σ [ Σ0[ Σ00)i:

This refines a logical design.

Adds a physical vocabulary SPto a logical vocabulary SL.

Adds a set of constraints Σ0on SPto Σ.

Adds a set of mapping constraints Σ00to relate SLand SP.

(16)

Query Compilation

Given

a physical designhS [ P; Σi, and

a query Q over the logical vocabulary S,

find a query plan Q0over the physical vocabulary P.

Q0must also satisfy two conditions:

1 Q0correctly implements Q for any database satisfying Σ.

2 Q0is the best such query plan.

Issues:

The choice of language(s)L for expressing Q, Q0and the constraints and

interfaces in Σ.

The meaning of “correctly implements”. The meaning of “best”.

(17)

Resolving Issues

We use first order logic (FOL) as the main component forL.

Advantages:

Ideal for capturing logical designs.

User queries correspond naturally to well-formed formulae in FOL.1

Ideal for capturing physical designs, after adding a notion of binding

patterns.2

——————–

1Domain independent well-formed formulae in FOL constitute a standard

metric of query expressiveness called relational completeness.

2Remains true for very sophisticated designs entailing: multi-level store,

pointers, hashing, capabilities for runtime typing, user defined functions, etc.

(18)

Further Advantages of FOL

Additional syntactic conditions on well-formed formulae in FOL lead to a

plan language with a small set of primitive operations.1

Query compilation corresponds naturally to reasoning in FOL.2

We consider: chase-based reasoning and more general interpolation in FOL. Reasoning based on interpolation in FOL can be used to synthesize plans for user queries that express both requests for information and for data update. ——————–

1The operations suffice to formulate query plans over sophisticated physical

designs that are often “built-in” in existing relational database technology.

2Significant factor in resolving issue of how a query plan correctly implements

a user query.

(19)

On the Existence of Query Plans

It is possible that no query plan exists for a given user query and physical design.

Happens when physical design has insufficient material capabilities for computing answers to a user query.

Note:Does not happen with (extant) relational technology since creating a new (logical) table

implies

creating at least one new (physical) index.

Implies in turn: At least one query plan must exist (and is easily found) for any possible user query.

Assuming a particular (initial) physical design is a big problem in information integration where it becomes necessary to integrate access to existing legacy data sources: the two are almost certain to be incompatible.

(20)

On the Best Query Plan

A query plan is an algorithm for solving the problem specified by a user query. There are two established ways of measuring the performance of an algorithm in terms of time or space costs.

1 By a complexity metric.

2 By statistical estimation.

A complexity metric provides a “big picture”, e.g., distinguishing between linear and quadratic algorithms.

Commonly distinguish between data complexity and combined complexity for database query plans. Note: data complexity is always AC0 for relationally complete query languages (e.g., the relational algebra).

Statistical estimation can often arbitrate between query plans at a finer level of granularity (e.g., a query plan running in linear time in the size of data may still be preferable to a query plan running in logarithmic time).

Caveat:The topic of how one measures performance is not part of the “official” course syllabus.

(21)

Discussion Topics: Compilation in the Large

Consider query compilation and physical design versus programming language compilation (e.g., Java compilation).

What role might be served by the various symbols in the following diagram when compiling Java source code?

Σ SLoo Q

Σ = (Σ[ Σ0[ Σ00) Σ00 (query compilation)



Σ0 SPoo Q0

How would the problem of automated physical design apply in this framework? Does a Java compiler also do automated physical design? If so, how do the “automated physical design” problems for Java and SQL compare?

(22)

Discussion Topics: Logical Data Independence

Does it make sense for a physical design to refine two separate logical designs?

Does it make sense for two logical designs to refine a common logical design? Does it make sense to have two separate physical designs refine subsets of a set of logical designs?

What are the advantages of organizing metadata as modules of logical and physical designs organized in a refinement hierarchy?

(23)

Discussion Topics: Information Retrieval

ACME’s APS department introduces attribute description.

ExamplePAYROLLdata for the description attribute.1

1 Mary’s description is “Mary is one of our most productive

employees in the more technical areas of the payroll

system. Mary gets along extremely well with her

colleagues, and is always punctual.”

ACME needs to express and evaluate the following user query.

2 Find the name of each employee entity that would be a good choice to

lead a project to incorporate recent changes to government regulations

on income tax deductions in thePAYROLLsystem.

1descriptionvalues correspond to (possibly long) sequences of characters

at one level, and to (unconstrained) English text at another.

How are the problems of physical design, query compilation, and the choice of user query language affected by this scenario?

References

Related documents