• No results found

In this section we establish some results concerning how much the number of FDs in two non-redundant covers can differ, and how many non-redundant covers a set of FDs can have.

Theorem 3.55. Let Σ, G be equivalent non-redundant sets of FDs over R. Then |G| ≤ |Σ| · |R|.

Proof. Since Σ and G are equivalent, every FD X A in Σ is implied by G. When computing the closure of X using G, we only use FDs which contribute at least one new attribute. Thus Gincludes a subset GX ⊆Gof cardinality at most|R\X|which implies

X →A. Since their union S

X→A∈F

GX ⊆Gis already a cover of Σ andGis non-redundant,

S

X→A∈Σ

GX =G. Clearly the cardinality of

S

X→A∈Σ

GX is bounded by|Σ| · |R|.

In [19] Gottlob shows the bound |G| ≤ |Σ| ·(|R| −1) for FDs with non-empty LHSs, and gives the following example to show that the bound is tight.

Example 3.9. LetR ={A, B1, . . . , Bn} and Σ ={A →B1. . . Bn}. Then

G={A→B1, . . . , A →Bn}

is a non-redundant cover of Σ, and |G|=n =|Σ| ·(|R| −1).

While the example above was based on splitting non-singular FDs into singular ones, the next example shows that the bound of Theorem 3.55 cannot be improved significantly even if we restrict ourselves to canonical covers. Note though that |Σ|=|G| if we restrict ourselves to non-redundant covers with FDs of the form X →X∗, since these are actually

minimal covers [40].

Example 3.10. Consider the relation schemaR ={A1, . . . , An, B1, . . . , Bn, C}with 2n+ 1

attributes. Associate with R the canonical set of FDs

Σ =    A1 →C, . . . , An→C, C →B1, . . . , C →Bn, B1. . . Bn →C   

of size 2n+ 1. The set G=              A1 →B1, . . . , An→B1, ... A1 →Bn, . . . , An →Bn, C →B1, . . . , C →Bn, B1. . . Bn →C             

is a canonical cover of Σ and contains n2+n+ 1 FDs. Thus |G|> 1

4 · |Σ| · |R|.

Using the bound established in Theorem 3.55, we will argue why it can make sense to try and compute all canonical covers.

We first note that the number of arbitrary covers for a set Σ of FDs over a schemaR

can be (and usually is) hyper-exponential in the number of attributes. This is the case since the number of FDs in Σ can be exponential in the number of attributes, and the

number of covers can be exponential in the number of FDs in Σ. Even by restricting

ourselves to the “most powerful” FDs of the form X X∗ (with minimal LHS X) and

requiring covers to be non-redundant (which together with the restriction on the form of FDs makes them minimal), we cannot avoid this. For brevity, we shall call non-redundant sets of LHS-reduced FDs of the form X →X∗ full.

Example 3.11. Consider R ={A1, . . . , A2n} and let Σ consist of all FDs X R, X ⊆R

with |X| = n. Clearly Σ is non-redundant and contains ¡2n n

¢

FDs. However, Σ has no full covers (other than itself). To create a large number of full covers, we add two extra attributes A and B toR, which gives usR0 =R∪ {A, B}, and change Σ to

Σ0 ={AX R0 |X R Σ} ∪ {AB, B A}.

It is easy to check that Σ0 is full. However, each FD AX R0 can be replaced by

BX →R0, and these replacements can be done independently from one another. Thus Σ

has at least 2(2nn) full covers.

We are thus trying to compute a potentially hyper-exponential number of covers, which at first glance seems rather infeasible even for small cases. However, when constructing the example above, we used a set of FDs Σ whose size is exponential in the number of attributes. This is rather unusual, and for small sets Σ we can establish better bounds for the number of non-redundant covers.

Theorem 3.56. Let Σ be a set of FDs over R, with cardinalities |Σ|=n|,|R|=k. The number of non-redundant covers of Σ is at most ¡2nk2k¢ <22nk2

.

Proof. The number of FDs on R is 22k, and any non-redundant cover of Σ contains at

most nk FDs by Theorem 3.55.

While the bound given can be improved, the relevant fact is that the number of covers is “only” exponential in the size of the input, rather than hyper-exponential. When con- sidering arbitrary non-redundant covers, we can obtain different non-redundant covers by changing FDs slightly, e.g. by adding LHS attributes to the RHS. As such changes can be done independently from one another, the number of non-redundant covers is practi- cally always exponential. Many of those variations are avoided by restricting ourselves to canonical covers. Using partial covers to represent them efficiently, we can hope to reduce the size of our representation to a reasonably small number. Experimental results can be found in the appendix.