• No results found

Data integration: A theoretical perspective

N/A
N/A
Protected

Academic year: 2021

Share "Data integration: A theoretical perspective"

Copied!
156
0
0

Loading.... (view fulltext now)

Full text

(1)

Data integration: A theoretical perspective

Maurizio Lenzerini

Dipartimento di Informatica e Sistemistica “Antonio Ruberti” Universit `a di Roma “La Sapienza”

Tutorial at PODS 2002

(2)

Data integration

Source 1 Source 2 Global schema Mapping Query R1 C1 D1 T1 c1 d1 t1 c2 d2 t2
(3)

Outline

Formal framework for data integration

Approaches to data integration

Query answering in different approaches

Dealing with inconsistency

Reasoning on queries in data integration

Conclusions
(4)

Formal framework

A data integration system

I

is a triple

hG

,

S

,

Mi

, where

G

is the global schema (over an alphabet

A

G)

S

is the source schema (over an alphabet

A

S)

M

is the mapping between

G

and

S

Semantics of

I

: which are the databases that satisfy

I

(models of

I

)?

We refer only to databases over a fixed infinite domain

Γ

, and we start with a source database

C

, (data available at the sources, also called source model) over

Γ

. The set of databases that satisfy

I

relative to

C

is:

sem

C

(

I

)

=

{ B | B

is legal wrt

G

(5)

Semantics of queries to

I

A query

q

of arity

n

is a FOL formula with

n

free variables.

If

D

is a database, then

q

D denotes the extension of

q

in

D

(i.e., the set of valuations in

Γ

for the free variables of

q

that make

q

true in

D

).

If

q

is a query of arity

n

posed to a data integration system

I

(i.e., a query over

A

G), then the set of certain answers to

q

wrt

I

and

C

is
(6)

Databases with incomplete information

Traditional database: one model of a first-order theory

Query answering means evaluating a formula in the model.

Database with incomplete information: set of models (specified, for example, as a restricted first-order theory)

Query answering means computing the tuples that satisfy the query in all the models in the set.

There is a strong connection between query answering in data integration and query answering in database with incomplete information under constraints.

(7)

Outline

Formal framework for data integration

Approaches to data integration

Query answering in different approaches

Dealing with inconsistency

Reasoning on queries in data integration

Conclusions
(8)

The mapping

How is the mapping

M

between

G

and

S

specified?

Are the sources defined in terms of the global schema?

Approach called source-centric, or local-as-view, or LAV.

Is the global schema defined in terms of the sources?

Approach called global-schema-centric, or global-as-view, or GAV.

A mixed approach?

Approach called GLAV.

Mapping between sources, without global schema? Approach called P2P.
(9)

GAV vs LAV – example

Global schema: movie

(

Title

,

Year

,

Director

)

european

(

Director

)

review

(

Title

,

Critique

)

Source 1: r1

(

Title

,

Year

,

Director

)

since 1960, european directors Source 2: r2

(

T itle, Critique

)

since 1990

Query: Title and critique of movies in 1998

D.

movie

(

T

,

1998

, D

)

review

(

T

,

R

)

,

written
(10)

Formalization of LAV

In LAV, the mapping

M

is constituted by a set of assertions:

s

φ

G (sound source)

~

x

(

s

(

~

x

)

φ

G

(

~

x

))

s

φ

G (exact source)

~

x

(

s

(

~

x

)

φ

G

(

~

x

))

one for each source element

s

in

A

S, where

φ

G is a query over

G

. Given source database

C

, a database

B

for

G

satisfies

M

wrt

C

if for each

s

∈ S

:

s

C

φ

GB (sound source)

s

C

=

φ

GB (exact source)

The mapping

M

and the source database

C

do not provide direct information about which data satisfy the global schema. Sources are views, and we have to answer
(11)

LAV – example

Global schema: movie

(

Title

,

Year

,

Director

)

european

(

Director

)

review

(

Title

,

Critique

)

LAV: associated to source relations we have views over the global schema

r1

(

T, Y, D

)

{

(

T, Y, D

)

|

movie

(

T, Y, D

)

european

(

D

)

Y

1960

}

r2

(

T, R

)

{

(

T, R

)

|

movie

(

T, Y, D

)

review

(

T, R

)

Y

1990

}

The query

{

(T, R)

|

movie

(T,

1998, D)

review

(T, R)

}

is processed by

means of an inference mechanism that aims at re-expressing the atoms of the global schema in terms of atoms at the sources. In this case:

(12)

Formalization of GAV

In GAV, the mapping

M

is constituted by a set of assertions:

g

φ

S (sound source)

~

x

(

φ

S

(

~

x

)

g

(

~

x

))

g

φ

S (exact source)

~

x

(

φ

S

(

~

x

)

g

(

~

x

))

one for each element

g

in

A

G, where

φ

S is a query over

S

. Given source database

C

, a database

B

for

G

satisfies

M

wrt

C

if for each

g

∈ G

:

g

B

φ

SC (sound source)

g

B

=

φ

SC (exact source)

Given a source database,

M

provides direct information about which data satisfy the elements of the global schema. Relations in

G

are views, and queries are expressed over the views. Thus, it seems that we can simply evaluate the query over the data
(13)

GAV – example

Global schema: movie

(

Title

,

Year

,

Director

)

european

(

Director

)

review

(

Title

,

Critique

)

GAV: associated to relations in the global schema we have views over the sources

movie

(

T, Y, D

)

{

(

T, Y, D

)

|

r1

(

T, Y, D

)

}

european

(

D

)

{

(

D

)

|

r1

(

T, Y, D

)

}

(14)

GAV – example of query processing

The query

{

(T, R)

|

movie

(T,

1998, D)

review

(T, R)

}

is processed by

means of unfolding, i.e., by expanding the atoms according to their definitions, so as to come up with source relations. In this case:

movie(T,1998,D)

review(T,R)

unfolding

(15)

GAV and LAV – comparison

LAV: (Information Manifold, DWQ, Picsel)

Quality depends on how well we have characterized the sources

High modularity and extensibility (if the global schema is well designed, when a source changes, only its definition is affected)

Query processing needs reasoning (query reformulation complex) GAV: (Carnot, SIMS, Tsimmis, IBIS, Picsel, . . . )

Quality depends on how well we have compiled the sources into the global schema through the mapping

Whenever a source changes or a new one is added, the global schema needs to be reconsidered

Query processing can be based on some sort of unfolding (query reformulation looks easier)
(16)

Beyond GAV and LAV: GLAV

In GLAV, the mapping

M

is constituted by a set of assertions:

φ

S

φ

G (sound source)

~

x

(

φ

S

(

~

x

)

φ

G

(

~

x

))

φ

S

φ

G (exact source)

~

x

(

φ

S

(

~

x

)

φ

G

(

~

x

))

where

φ

S is a query over

S

, and

φ

G is a query over

G

. Given source database

C

, a database

B

that is legal wrt

G

satisfies

M

wrt

C

if for each assertion in

M

:

φ

SC

φ

GB (sound source)

φ

SC

=

φ

GB (exact source)

The mapping

M

does not provide direct information about which data satisfy the global schema: to answer a query

q

over

G

, we have to infer how to use

M

in order
(17)

Example of GLAV

Global schema:

W ork

(

P erson, P roject

)

,

Area

(

P roject, F ield

)

Source 1:

HasJob

(

P erson, F ield

)

Source 2:

T each

(

P rof essor, Course

)

, In

(

Course, F ield

)

Source 3:

Get

(

Researcher, Grant

)

, F or

(

Grant, P roject

)

GLAV mapping:

{

(

r, f

)

|

HasJob

(

r, f

)

}

{

(

r, f

)

|

W ork

(

r, p

)

Area

(

p, f

)

}

{

(

r, f

)

|

T each

(

r, c

)

In

(

c, f

)

} ⊆

{

(

r, f

)

|

W ork

(

r, p

)

Area

(

p, f

)

}

(18)

Beyond GLAV: P2P data integration

In P2P, the global schema does not exist. Constraints (that we can still call

G

) are defined over

A

G

=

A

S1

∪ · · · ∪ A

Sn

and the mapping

M

is constituted by a set of assertions (

φ

S1i

, φ

S2j are queries over the alphabets

A

Si and

A

Sj, respectively):

φ

S1i

φ

S2j.

A

S is a distinguished subset of predicates in

A

G, called “base predicates” (where

data are). A source database is a database for the base predicates. Given source database

C

, a database

W

that satisfies

I

relative to

C

is a database for

S

such that, for each assertion

φ

1

φ

2 in

M

,

φ

W1

φ

W2 .

Queries are now expressed over alphabet

A

Si, and the notion of certain answers is the usual one.
(19)

A unified view

Alphabet:

A

=

A

G

∪ A

S

Integrity constraints: constraints

G

, and mapping

M

Partial database: source database

Database: data for all symbols in

A

that are both coherent with the partial database and satisfy the integrity constraints

Query answering: computing the tuples that satisfies the query in every database

Under this view, the difference between LAV, GAV, GLAV, P2P is reflected in the

(20)

Query answering with incomplete information

[Reiter ’84]: relational setting, databases with incomplete information modeled as a first order theory

[Vardi ’86]: relational setting, complexity of reasoning in closed world databases with unknown values

Several approaches both from the DB and the KR community
(21)

Connection to query containment

Query containment (under constraints

T

) is the problem of checking whether

q

1B is contained in

q

2B for every database

B

(satisfying

T

), where

q

1

, q

2 are queries with the same arity.

A source database

C

can be represented as a conjunction

q

C of ground literals over

A

S (e.g., if

~

x

is in

s

C, then the corresponding literal is

s

(

~

x

)

)

If

q

is a query, and

~

t

is a tuple, then we denote by

q

~t the query obtained by substituting the free variables of

q

with

~

t

The problem of checking whether

~

t

q

I,C can be reduced to the problem of checking whether

q

C is contained in

q

~t under the constraints

G ∪ M

The combined complexity of checking certain answers is identical to the complexity of query containment under constraints, and the data complexity is at most the

(22)

Outline

Formal framework for data integration

Approaches to data integration

Query answering in different approaches

Dealing with inconsistency

Reasoning on queries in data integration

Conclusions
(23)

Dealing with incompleteness and inconsistency

We analyze the problem of query answering in different cases, depending on two parameters:

Global schema: - without constraints, - with constraints

Mapping: - GAV or LAV, - sound or complete

Given a source database

C

, we call retrieved global database any database for

G

that satisfies the mapping wrt

C

.
(24)

Incompleteness and inconsistency

Constraints Type of Incomple-

Inconsi-in

G

mapping teness stency

no GAV/exact no no

no GAV/sound yes/no no

no LAV/sound yes no

no LAV/exact yes yes

yes GAV/exact no yes

yes GAV/sound yes yes

yes LAV/sound yes yes

(25)

Incompleteness and inconsistency

Constraints Type of Incomple-

Inconsi-in

G

mapping teness stency

no GAV/exact no no

no GAV/sound yes/no no

no LAV/sound yes no

no LAV/exact yes yes

yes GAV/exact no yes

yes GAV/sound yes yes

yes LAV/sound yes yes

(26)

INT[noconstr, GAV/exact]: example

Consider

I

=

hG

,

S

,

Mi

, with Global schema

G

:

student

(

Scode

,

Sname

,

Scity

)

university

(

Ucode

,

Uname

)

enrolled

(

Scode

,

Ucode

)

Source schema

S

: database relations s1

,

s2

,

s3

Mapping

M

:

student

(

X, Y, Z

)

{

(

X, Y, Z

)

|

s1

(

X, Y, Z, W

)

}

university

(

X, Y

)

{

(

X, Y

)

|

s2

(

X, Y

)

}

(27)

INT[noconstr, GAV/exact]: example

Student oslo bill 15 florence anne 12 city name code University ucla BN bocconi AF name code Enrolled AF 12 BN 16 Ucode Scode

s

C1

12

anne florence

21

15

bill oslo

24

s

C2

AF

bocconi

BN

ucla

s

C3

12

AF

16

BN

(28)

INT[noconstr, GAV/exact]

Sources

Mapping

Global schema

Retrieved GDB

Source model

Model of I

=

(29)

INT[noconstr, GAV/exact]: query answering

Use

M

for computing from

C

the retrieved global database, where each element

g

of

G

satisfies exactly the tuples of

C

satisfying the

φ

S that

M

associates to

g

Since

G

does not have constraints, the retrieved global database is legal wrt

G

Actually, it is the only database that is legal wrt

G

, and that satisfies

M

wrt

C

Thus, we can simply evaluate the query

q

over the retrieved global database,

which is equivalent to unfolding the query according to

M

, in order to obtain a query on

A

S to be evaluated over

C

(30)

INT[noconstr, GAV/exact]: example of query answering

Mapping

M

: student

(

X, Y, Z

)

{

(

X, Y, Z

)

|

s1

(

X, Y, Z, W

)

}

university

(

X, Y

)

{

(

X, Y

)

|

s2

(

X, Y

)

}

enrolled

(

X, W

)

{

(

X, W

)

|

s3

(

X, W

)

}

s

C1

12

anne florence

21

15

bill oslo

24

s

C2

AF

bocconi

BN

ucla

s

C3

12

AF

16

BN

Query:

{

(

X

)

|

student

(

X, Y, Z

)

,

enrolled

(

X, W

)

}

Unfolding wrt

M

:

{

(

X

)

|

s

1

(

X, Y, Z, V

)

, s

3

(

X, W

)

}

(31)

Incompleteness and inconsistency

Constraints Type of Incomple-

Inconsi-in

G

mapping teness stency

no GAV/exact no no

no GAV/sound yes/no no

no LAV/sound yes no

no LAV/exact yes yes

yes GAV/exact no yes

yes GAV/sound yes yes

yes LAV/sound yes yes

(32)

INT[noconstr, GAV/sound]: example

Student

oslo bill 15 florence anne 12 city name code

University

uniroma UR ucla BN bocconi AF name code

Enrolled

AF 12 BN 16 Ucode Scode

s

C1

12

anne florence

21

15

bill oslo

24

s

C2

AF

bocconi

BN

ucla

s

C3

12

AF

16

BN

(33)

INT[noconstr, GAV/sound]

The GAV mapping assertions have the logical form:

~

x

φ

s

(

~

x

)

g

(

~

x

)

The intersection of all retrieved global databases (which can be computed by letting each element

g

of

G

satisfy exactly the tuples of

C

satisfiying the

φ

S that

M

associates to

g

) still satisfies

M

wrt

C

, and therefore, is the only “minimal”

model of

I

.

Incompleteness is of special form. For queries without negation, unfolding is sufficient.

(34)

INT[noconstr, GAV/sound]

Sources

Mapping

Global schema

Intersection

of retrieved GDBs

Source model

Minimal

Model of I

=

(35)

Incompleteness and inconsistency

Constraints Type of Incomple-

Inconsi-in

G

mapping teness stency

no GAV/exact no no

no GAV/sound yes/no no

no LAV/sound yes no

no LAV/exact yes yes

yes GAV/exact no yes

yes GAV/sound yes yes

yes LAV/sound yes yes

(36)

INT[noconstr, LAV/sound]: incompleteness

The LAV mapping assertions have the logical form:

~

x

s

(

~

x

)

φ

G

(

~

x

)

In general, given a source database

C

there are several solutions of the above assertions (i.e., different databases that are legal wrt

G

that satisfies

M

wrt

C

). Incompleteness comes from the mapping. This holds even for the case of simple queries

φ

G:

s

1

(

x

)

{

(

x

)

| ∃

y g

(

x, y

)

}

(37)

INT[noconstr, LAV/sound]

Sources

Mapping

Global schema

Retrieved GDBs

Source model

Models of I

=

=

(38)

INT[noconstr, LAV/sound]: dealing with incompleteness

View-based query processing: Answer a query based on a set of materialized views, rather than on the raw data in the database.

Relevant problem in

Data warehousing

Query optimization
(39)

INT[noconstr, LAV/sound]: dealing with incompleteness

In LAV/sound data integration, the views are the sources. Two approaches to view-based query processing:

View-based query rewriting: query processing is divided in two steps

1. re-express the query in terms of a given query language over the alphabet of

A

S

2. evaluate the rewriting over the source database

C

View-based query answering: no limitation is posed on how queries are

processed, and the only goal is to exploit all possible information, in particular the source database, to compute the certain answers to the query

(40)

INT[noconstr, LAV/sound]: connection to query containment

If queries in

M

are conjunctive queries, then we can substitute the query that

M

associates to

s

for every

s

-literal in

q

C, and therefore, checking certain answers can be reduced to checking pure containment (without

constraints) of two queries in the alphabet

A

G
(41)

INT[noconstr, LAV/sound]: some results for query answering

Conjunctive queries using conjunctive views [Levy&al. PODS’95]

Recursive queries (datalog programs) using conjunctive views

[Duschka&Genesereth PODS’97], [Afrati&al. ICDT’99]

Complexity analysis [Abiteboul&Duschka PODS’98] [Grahne&Mendelzon ICDT’99]

Variants of Regular Path Queries [Calvanese&al. ICDE’00, PODS’00] [Deutsch&Tannen DBPL’01], [Calvanese&al. DBPL’01]
(42)

INT[noconstr, LAV/sound]: data complexity

From [Abiteboul&Duschka PODS’98]:

Sound sources CQ CQ6= PQ datalog FOL

CQ PTIME coNP PTIME PTIME undec.

CQ6= PTIME coNP PTIME PTIME undec.

PQ coNP coNP coNP coNP undec.

datalog coNP undec. coNP undec. undec.

(43)

INT[noconstr, LAV/sound]: basic technique

Consider conjunctive queries and conjunctive views.

r1

(

T

)

{

(

T

)

|

movie

(

T, Y, D

)

european

(

D

)

}

r2

(

T, V

)

{

(

T, V

)

|

movie

(

T, Y, D

)

review

(

T, V

)

}

T

r1

(

T

)

→ ∃

Y

D

movie

(

T, Y, D

)

european

(

D

)

T

V

r2

(

T, V

)

→ ∃

Y

D

movie

(

T, Y, D

)

review

(

T, V

)

movie

(

T, f

1

(

T

)

, f

2

(

T

))

r1

(

T

)

european

(

f

2

(

T

))

r1

(

T

)

movie

(

T, f

4

(

T, V

)

, f

5

(

T, V

))

r2

(

T, V

)

review

(

T, V

))

r2

(

T, V

)

Answering a query means evaluating a goal wrt to this nonrecursive logic program

(44)

INT[noconstr, LAV/sound]: polynomial intractability

Given a graph

G

= (

N, E

)

, we define

I

=

hG

,

S

,

Mi

, and source database

C

:

V

b

R

b

V

f

R

f

V

t

R

rg

R

gr

R

rb

R

br

R

gb

R

bg

V

bC

=

{

(

c, a

)

|

a

N, c

6∈

N

}

V

fC

=

{

(

a, d

)

|

a

N, d

6∈

N

}

V

tC

=

{

(

a, b

)

,

(

b, a

)

|

(

a, b

)

E

}

Q

R

b

·

M

·

R

f

where

M

describes all mismatched edge pairs (e.g.,

R

rg

·

R

rb). If

G

is 3-colorable, then

∃B

where

M

(and

Q

) is empty, i.e.

(

c, d

)

6∈

Q

I,C. If

G

is not 3-colorable, then

M

is nonempty

∀B

, i.e.

(

c, d

)

Q

I,C.
(45)

INT[noconstr, LAV/sound]: in coNP

Consider the case of Datalog queries and positive views.

~

t

is not a certain answer to

Q

wrt

I

and

C

, if and only if there is a database

B

for

I

such that

~

t

6∈

Q

B, and

B

satisfies

M

wrt

C

Because of the form of

M

~

x

(s

(

~

x

)

→ ∃

y

~

1

α

1

(

~

x

, ~

y

1

)

. . .

∨ ∃

y

~

h

α

h

(

~

x

, ~

y

h

)

)

each tuple in

C

forces the existence of

k

tuples in any database that satisfies

M

wrt

C

, where

k

is the maximal length of conjuncts in

M

If

C

has

n

tuples, then there is a database

B

0

⊆ B

for

I

that satisfies

M

wrt

C

with at most

n

·

k

tuples. Since

Q

is monotone,

~

t

6∈

Q

B0.

Checking whether

B

0 satisfies

M

wrt

C

can be done in PTIME wrt the size of

B

0.

=

coNP data complexity for Datalog queries and positive views.
(46)

INT[noconstr, LAV/sound]: the case of RPQ

We deal with the problem of answering queries to data integration systems of the form

hG

,

S

,

Mi

, where

G

simply fixes the labels (alphabet

Σ

) of a semi-structured database

the sources in

S

are relational

the mapping

M

is of type LAV
(47)

Global semi-structured database

sub sub

sub var sub sub sub

sub sub sub

calls calls

calls

(48)

Global semi-structured databases and queries

sub sub

sub var sub sub sub

sub sub sub

calls calls

calls

var var var var

a

b

(49)

Global semi-structured databases and queries

sub sub

sub var sub sub sub

sub sub sub

calls calls

calls

var var var var

a

b

(50)

INT[noconstr, LAV/sound]: the case of RPQ

Given

I

=

hG

,

S

,

Mi

, where

G

simply fixes the labels (alphabet

Σ

) of a semi-structured database

the sources in

S

are binary relations

the mapping

M

is of type LAV, and associates to each source

s

a 2RPQ

w

over

Σ

x, y

s

(

x, y

)

x

w

y

a source database

C

a 2RPQ

Q

over

Σ

a pair of objects

~t

~t

Q

I,C
(51)

Query answering: Technique

We search for a counterexample to

~t

Q

I,C, i.e., a database

B

legal for

I

wrt

C

such that

~t

6∈

Q

B

Crucial point: it is sufficient to restrict our attention to canonical databases, i.e., databases

B

that can be represented by a word

w

B

$

d

1

w

1

d

2

$

d

3

w

2

d

4

$

· · ·

$

d

2m1

w

m

d

2m

$

where

d

1

, . . . , d

2m are constants in

C

,

w

i

Σ

+, and $ acts as a separator
(52)

We need techniques for ...

checking whether a pair of objects satisfies a 2RPQ query in the case of

a word representing a path

a word representing semipath
(53)

Finite-state automata and RPQs

a b c

r

d

p

q

….

….

Q

=

r

·

(p

q

)

·

q

·

q

Automaton for Q

s

1

δ

(

s

0

, r

)

, s

2

δ

(

s

1

, p

)

, s

2

δ

(

s

1

, q

)

,

s

3

δ

(

s

2

, q

)

, s

3

δ

(

s

3

, q

)

(54)

2way Regular Path Queries

2way Regular Path Queries (2RPQ) are expressed by means of finite-state automata over

Σ

0

∪ {

p

|

p

Σ

0

}

.

r

·

(

p

q

)

·

(

p

·

p

)

·

q

·

q

r p q p_ p q q
(55)

Finite-state automata and 2RPQs

a b c

r

d

p

q

….

….

Word:

r

p q

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

Automaton for Q

s

1

δ

(

s

0

, r

)

, s

2

δ

(

s

1

, p

)

, s

2

δ

(

s

1

, q

)

,

s

3

δ

(

s

2

, p

)

, s

4

δ

(

s

3

, p

)

, s

5

δ

(

s

4

, q

)

, s

5

δ

(

s

5

, q

)

State:

s

0 Transition:

s

1

δ

(

s

0

, r

)

(56)

Finite-state automata and 2RPQs

a b c

r

d

p

q

….

….

Word:

r

p

q

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

Automaton for Q

s

1

δ

(

s

0

, r

)

, s

2

δ

(

s

1

, p

)

, s

2

δ

(

s

1

, q

)

,

s

3

δ

(

s

2

, p

)

, s

4

δ

(

s

3

, p

)

, s

5

δ

(

s

4

, q

)

, s

5

δ

(

s

5

, q

)

State:

s

1 Transition:

s

δ

(

s

, p

)

(57)

Finite-state automata and 2RPQs

a b c

r

d

p

q

….

….

Word:

r p

q

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

Automaton for Q

s

1

δ

(

s

0

, r

)

, s

2

δ

(

s

1

, p

)

, s

2

δ

(

s

1

, q

)

,

s

3

δ

(

s

2

, p

)

, s

4

δ

(

s

3

, p

)

, s

5

δ

(

s

4

, q

)

, s

5

δ

(

s

5

, q

)

State:

s

2 Transition: none
(58)

Finite-state automata and 2RPQs

a b c

r

d

p

q

….

….

Word:

r p

q

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

State:

s

2 Transition: none

(

a, d

)

satisfies query

Q

, but the path from

a

to

d

is not accepted by the 1NFA

corresponding to

Q

: the computation for 2RPQs is not captured by finite-state automata.
(59)

2way automata (2NFA)

A 2way automaton

A

= (Γ

, S, S

0

, ρ, F

)

consists of an alphabet

Γ

, a finite set of states

S

, a set of initial states

S

0

S

, a transition function

ρ

:

S

×

Σ

2

S × {−1,0,1} and a set of accepting states

F

S

.

Given a 2way automaton

A

with

n

states, one can construct a one-way automaton

B

1 with

O

(2

nlogn

)

states such that

L

(

B

1

) =

L

(

A

)

, and a one-way automaton

B

2 with

O

(2

n

)

states such that

L

(

B

2

) = Γ

L

(

A

)

.
(60)

2way automata and 2RPQs

Given a 2RPQ

E

= (Σ

, S, I, δ, F

)

over the alphabet

Σ

, the corresponding 2way automaton

A

E is:

A

= Σ

∪ {

$

}

, S

A

=

S

∪ {

s

f

} ∪ {

s

|

s

S

}

, I, δ

A

,

{

s

f

}

)

where

δ

A is defined as follows:

(

s

2

,

1)

δ

A

(

s

1

, r

)

, for each transition

s

2

δ

(

s

1

, r

)

of

E

enter backward mode:

(

s

,

1)

δ

A

(

s, `

)

, for each

s

S

and

`

Σ

A

exit backward mode:

(

s

2

,

0)

δ

A

(

s

1

, r

)

, for each

s

2

δ

(

s

1

, r

)

of

E

(

s

f

,

1)

δ

A

(

s,

$)

, for each

s

F

.
(61)

2way automata and 2RPQs

a b c

r

d

p

q

….

….

Q

=

r

·

(p

q

)

·

p

·

p

·

q

·

q

Automaton for Q

s

1

δ

(

s

0

, r

)

, s

2

δ

(

s

1

, p

)

, s

2

δ

(

s

1

, q

)

,

s

3

δ

(

s

2

,

p

)

, s

4

δ

(

s

3

, p

)

, s

5

δ

(

s

4

, q

)

, s

5

δ

(

s

5

, q

)

2way automaton

(

s

1

,

1)

δ

A

(

s

0

, r

)

,

(

s

2

,

1)

δ

A

(

s

1

, p

)

,

(

s

2

,

1)

δ

A

(

s

1

, q

)

,

(

s

2

,

1

)

δ

A

(

s

2

,

q

)

,

(

s

3

,

0

)

δ

A

(

s

2

,

p

)

,

(

s

4

,

1)

δ

A

(

s

3

, p

)

,

(

s

5

,

1)

δ

A

(

s

4

, q

)

,

(

s

f

,

1)

δ

A

(

s

5

,

$)

(62)

2NFA and 2RPQs

a b c

r

d

p

q

….

….

Word:

r

p q

$

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

Automaton for

Q

(

s

1

,

1)

δ

A

(

s

0

, r

)

,

(

s

2

,

1)

δ

A

(

s

1

, p

)

,

(

s

2

,

1)

δ

A

(

s

2

, q

)

,

(

s

3

,

0)

δ

A

(

s

2

, p

)

,

(

s

4

,

1)

δ

A

(

s

3

, p

)

,

(

s

5

,

1)

δ

A

(

s

4

, q

)

,

(

s

f

,

1)

δ

A

(

s

5

,

$)

State:

s

0

(

s

,

1)

δ

(

s

, r

)

(63)

2NFA and 2RPQs

a b c

r

d

p

q

….

….

Word:

r

p

q

$

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

Automaton for

Q

(

s

1

,

1)

δ

A

(

s

0

, r

)

,

(

s

2

,

1)

δ

A

(

s

1

, p

)

,

(

s

2

,

1)

δ

A

(

s

2

, q

)

,

(

s

3

,

0)

δ

A

(

s

2

, p

)

,

(

s

4

,

1)

δ

A

(

s

3

, p

)

,

(

s

5

,

1)

δ

A

(

s

4

, q

)

,

(

s

f

,

1)

δ

A

(

s

5

,

$)

State:

s

1 Transition:

(

s

2

,

1)

δ

A

(

s

1

, p

)

(64)

2NFA and 2RPQs

a b c

r

d

p

q

….

….

Word:

r p

q

$

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

Automaton for

Q

(

s

1

,

1)

δ

A

(

s

0

, r

)

,

(

s

2

,

1)

δ

A

(

s

1

, p

)

,

(

s

2

,

1)

δ

A

(

s

2

, q

)

,

(

s

3

,

0)

δ

A

(

s

2

, p

)

,

(

s

4

,

1)

δ

A

(

s

3

, p

)

,

(

s

5

,

1)

δ

A

(

s

4

, q

)

,

(

s

f

,

1)

δ

A

(

s

5

,

$)

State:

s

2

(

s

,

1)

δ

(

s

, q

)

(65)

2NFA and 2RPQs

a b c

r

d

p

q

….

….

Word:

r

p

q

$

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

Automaton for

Q

(

s

1

,

1)

δ

A

(

s

0

, r

)

,

(

s

2

,

1)

δ

A

(

s

1

, p

)

,

(

s

2

,

1)

δ

A

(

s

2

, q

)

,

(

s

3

,

0)

δ

A

(

s

2

, p

)

,

(

s

4

,

1)

δ

A

(

s

3

, p

)

,

(

s

5

,

1)

δ

A

(

s

4

, q

)

,

(

s

f

,

1)

δ

A

(

s

5

,

$)

State:

s

2 Transition:

(

s

3

,

0)

δ

A

(

s

2

, p

)

(66)

2NFA and 2RPQs

a b c

r

d

p

q

….

….

Word:

r

p

q

$

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

Automaton for

Q

(

s

1

,

1)

δ

A

(

s

0

, r

)

,

(

s

2

,

1)

δ

A

(

s

1

, p

)

,

(

s

2

,

1)

δ

A

(

s

2

, q

)

,

(

s

3

,

0)

δ

A

(

s

2

, p

)

,

(

s

4

,

1)

δ

A

(

s

3

, p

)

,

(

s

5

,

1)

δ

A

(

s

4

, q

)

,

(

s

f

,

1)

δ

A

(

s

5

,

$)

State:

s

3

(

s

,

1)

δ

(

s

, p

)

(67)

2NFA and 2RPQs

a b c

r

d

p

q

….

….

Word:

r p

q

$

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

Automaton for

Q

(

s

1

,

1)

δ

A

(

s

0

, r

)

,

(

s

2

,

1)

δ

A

(

s

1

, p

)

,

(

s

2

,

1)

δ

A

(

s

2

, q

)

,

(

s

3

,

0)

δ

A

(

s

2

, p

)

,

(

s

4

,

1)

δ

A

(

s

3

, p

)

,

(

s

5

,

1)

δ

A

(

s

4

, q

)

,

(

s

f

,

1)

δ

A

(

s

5

,

$)

State:

s

4 Transition:

(

s

5

,

1)

δ

A

(

s

4

, q

)

(68)

2NFA and 2RPQs

a b c

r

d

p

q

….

….

Word:

r p q

$

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

Automaton for

Q

(

s

1

,

1)

δ

A

(

s

0

, r

)

,

(

s

2

,

1)

δ

A

(

s

1

, p

)

,

(

s

2

,

1)

δ

A

(

s

2

, q

)

,

(

s

3

,

0)

δ

A

(

s

2

, p

)

,

(

s

4

,

1)

δ

A

(

s

3

, p

)

,

(

s

5

,

1)

δ

A

(

s

4

, q

)

,

(

s

f

,

1)

δ

A

(

s

5

,

$)

State:

s

5

(

s

,

1)

δ

(

s

,

$)

(69)

2NFA and 2RPQs

a b c

r

d

p

q

….

….

Word:

r p q

$

Query:

Q

=

r

·

(

p

q

)

·

p

·

p

·

q

·

q

State:

s

f

(

a, d

)

satisfies query

Q

, and the path from

a

to

d

is accepted by the 2NFA
(70)

2NFA and view extensions

Global schema G:

(r

p

q

r

p

q

)

r

q

(p

r)

°

p

(p

q

)

r

°

r

r

°

(q

p)

(d1,d2) (d4,d5) (d4,d2) (d3,d3) (d2,d3)

Sources:

Database for

G

: d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

(71)

2NFA and view extensions

d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Database

B

as a word:

$d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

To verify that

(

d

1

, d

3

)

satisfies

Q

in the above database

B

, we build

A

(Q,d1,d3), by exploiting not only the ability of 2way automata to move on the word both forward and backward, but also the ability to jump from one position in the word representing a node to any other position (either preceding or succeeding) representing the same node.
(72)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

0
(73)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

0
(74)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p

p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

0
(75)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p

p

d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

0
(76)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p

d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

0
(77)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

0
(78)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

0
(79)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

1 Transition:

(

s

1

,

1)

δ

A

(

s

1

, d

1

)

(80)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r

d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

1
(81)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r

d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

2
(82)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

(

s

2

, d

2

)

(83)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

(

s

2

, d

2

)

(84)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

(

s

2

, d

2

)

(85)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

(

s

2

, d

2

)

(86)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

2
(87)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

2
(88)

A run of

A

(Q,d1,d3) d1 d2

r

d3

r

r

q

r

d4

p

p

d5

p

Word:

$

d

4

p p d

5

$

d

1

r d

2

$

d

4

p

d

2

$

d

3

r r d

3

$

d

2

r q d

3

$

Q

=

r

·

(

p

q

)

·

(

p

·

r

)

·

q

·

q

State:

s

3
(89)

A run of

A

(Q,d1,d3)

References

Related documents

author concludes that “the actual waste level of the fi rm does not directly depend upon the size of the fi ne or the probability of discovery of the violation.” That is, increases

Feeding with breast milk was the most significant factor associated with the microbiome structure. It associated with higher levels of Bifidobacterium sp. Bacteroides associated

horizontal opening angle of 90° at 4 kHz means that a single loudspeaker can provide natural sound.. reproduction over an extensive

Any person who, knowingly and with intent to defraud any insurance company or other person, files an Application for insurance containing any materially false information, or

In- terestingly, both the removal of haem from haemopexin and HasA addi- tionally use steric hindrance to displace the haem group from its high- af finity binding site, while no

Overall CPFM and institutional characteristics are positively associated with child labor and in three models (total child labor using CPFM sub-indices) statistically

LDIF Application Layer Data Access, Integration and Storage Layer Web of Data Publication Layer Integrated Web Data Web Data Access Module Vocabulary Mapping Module

Insurers and reinsurers along with trade allies and other members of their community (actuaries, 31 brokers, agents, modellers, risk managers, asset managers and regulators)