Reasoning about Fuzzy Functional Dependencies
∗Pablo Cordero Angel Mora Manuel Enciso Gabriel Aguilera I.P. de Guzm´an E.T.S.I. Inform´atica. Universidad de M´alaga.
{pcordero,enciso}@uma.es,{amora,gabri,guzman}@ctima.uma.es,
Abstract
In literature, there exists several logics to specify and manipulate functional dependen- cies (FDs) and fuzzy FDs (FFDs). Never- theless, their inference systems are suitable to illustrate FD and FFD semantics but they may not be used as a formal base to develop automated deduction methods. In this work, we use the correct and complete axiomatic system Se, with a new FFD inference rule, named Substitution Rule. This rule allows us to introduce the Deduction Theorem for FFD and is the key for solving the FFD implica- tion problem.
Keywords: fuzzy functional dependencies, logic, implication problem.
1 Introduction
Constraints are often used to guide the design of note- worthy relational schema for the sake of the database consistence and therefore to avoid the problems of redundances, anomalies,etc. This statement is valid for any extension of the classical relational model.
During three consecutive decades, Raju and Majum- dar [19], Cubero and Vila [6], Tyagi et.al. [23],etc.
have studied the fuzzy models and which constraints are more appropriated to extend the well studied re- lational database theory to their fuzzy database.
Several approaches to the definition of fuzzy functional dependency [6, 18, 19, 22, 23] (FFD) are proposed in the literature. In the same way as the concept of func- tional dependency (FD) correspond to the notion of partial function, it should be desired that the concept
∗Partially supported by Spanish DGI project TIN2007- 65819 and Junta de Andalucia project TIC115.
of FFD would correspond to the notion of fuzzy par- tial function. The definitions proposed in [19, 22, 23]
fit in this idea.
There exists a wide range of dependencies. Some of them were investigated in the past (FDs [4, 9], Mul- tivalued Dependencies [10], etc.) and others are still being studied today ( XML FDs [13], FFDs [22], etc.).
Moreover, each dependency definition is usually fol- lowed by its corresponding logic. In [19, 23] the au- thors propose Armstrong’s axioms as a useful tool for reasoning with FFD, but these inferences rule has not been used successfully in automated deduction. The reason is that this inference system were created to explain dependency semantics more than to design an automated deduction system. In fact, in [23] the au- thors propose the classical closure algorithm to solve the implication problem and don’t directly use Arm- strong’s axioms.
In [5] a novel logic (SL) equivalent to classical Arm- strong’s axioms was presented. The core of SL is the Substitution Rule 1. The definition of SL in- troduces, for the first time, interesting solutions to database problems, which are solved using logic-based automated deduction methods [1, 15, 16].
In this work, we illustrate how the Substitution Rule can be considered to reasoning with FFDs. First, we extend the language of FFD Logic to allow empty left hand side formulae and we use this new well formed formulae as goals to be satisfied in a given FFD the- ory. The substitution rule is used to build a novel substitution algorithm directly based on the inference system. We emphasize that we solve the implication problem for FFDs reducing the set of FFDs with the substitution rule.
This work opens the door to the management of FFD constraints in an efficient and intelligent way and is organized as follows: Section 2 introduces some defi-
1The primitive inference system doesn’t contain the transitivity rule and is a rule derived in SL (see [5]).
nitions of FFDs. In section 3 we show how previous FD logics reason about FFDs and the limitations of these logics to manipulate automatically a set of FFDs.
We introduce the problem that we solve in this paper, named ”the FFD implication problem”. In section 4 we propose a new automated deduction method to solve the FFD implication problem and finally we establish several conclusions and future works in section 5.
2 Fuzzy Functional Dependencies
Since the 80’s, some authors have extended the Re- lational Model to include imprecise/fuzzy informa- tion [6, 18, 19, 23]. A fuzzy extension of the Codd Relational Model also require to consider an extension of the integrity constraints that may involve fuzzy con- cepts. In classical relational databases the study of the integrity constraints have been developed successfully in the past. The FD is considered the first step of this studies, since the definition of FD play an outstanding role in the Relational Model. The FD is very close to the notion of key and it is used in other areas like knowledge discovery, query optimization, database de- sign, etc.
The classical FDs specify an association between the values of the attributes of a relation which allows to include, in a formal way, constraints like the follow- ing: if two employees have the same qualification then they have the same salary. As J. Paredaens et al.
say in [17]: “they are constraint in the description of a database in order to ensure that the instances we might obtain are meaningful”. Thus, if the Functional Dependency X7→Y holds in a relation R, where X, Y are subset of attributes, then for any instance of R if two tuples agree on X, then they also agree on Y . When a fuzzy extension of the notion of FD are intro- duced, some authors consider that the data becomes fuzzy while others consider that the dependency may be imprecise. In [19] the author introduces a definition of FFD to model fuzzy information such as employ- ees with similar experience must have similar salaries, which may be consider a precise dependency, while in [22] they propose a FFD which allows to model ex- pressions such as the intelligence level of a person more or less determines the degree of success.
Each definition of FFD treats a particular aspect of the fuzziness. We emphasize the definition of [19]
because the introduction of an operator (a fuzzy re- semblance measure EQ for comparing domain values) made possible a generalization of the FDs and to the their definition of FFDs: A FFD X Y holds in a fuzzy relation r, if for all tuples t1and t2of r, we have
that
µEQ(t1[X], t2[X]) ≤ µEQ(t1[Y ], t2[Y ])
where µ is the membership function of the fuzzy rela- tion2.
A great variety of FFD definitions can be found in the literature. In [23] the authors summarize some of them and study the fuzziness and imprecision degree, both in the data and in the dependency of the FFD defini- tions. This study shows a definition [22] which has a high degree of fuzziness and it is a natural extension of the classical FD concept:
X →ΘF Y , holds in a fuzzy relation if, for every pair of tuples t1 and t2 then Conf ((t1, t2)[X]) >
min(Θ, Conf ((t1, t2)[Y ]), Θ ∈ [0, 1].
If Θ = 0, there is no valid dependency and if Θ = 1 then imprecise FFD become precise FFD; otherwise, imprecise FFDs exist. Here Conf (t1, t2)[X] is the de- gree of closeness between two tuples t1and t2projected over attributes X in a fuzzy relation instance and is called the conformance.
When Θ = 1 then, the definition of [22] generalizes the one presented in [19]. In this work, we deal with this second definition. An extension of our study to consider more general definitions will be made in a future work.
3 Reasoning about FFDs
Several axiomatic systems has been published [9, 17]
that are equivalent to Amstrong’s axiomatic system.
These logics may be considered as formal tools to for- mally explain how to deduce a dependency from a given set of dependencies. However, in [11, 12], the authors state the limitations to manage dependencies using logic, and they claim for efficient automatic com- putation methods. On the other hand, in [12, 14, 20], indirect methods for the manipulation of dependencies are used as an alternative.
Specifically, we are interested in solving the implica- tion problem by using directly the logic. Unfortu- nately, all the classical axiomatic systems are not suit- able tools to develop automated deduction techniques, because all of them are based on the transitive rule.
Instead of that, in literature, there are several algo- rithms to solve the implication problem in polynomial time (see [3, 7] for further details). Nevertheless, these efficient methods have a very important disadvantage:
they do not allow to give an explanation about the answer. When we use an indirect method we are not
2µ denotes the more o less equal relation.
able to translate the final solution into an inference chain to explain the answer in terms of the inference system. This limits the use of these indirect methods in artificial intelligence environments.
In [5] we introduce a new logic SL equivalent to the classical logics (its axiomatic system is equivalent to the Armstrong’s system), but with an important differ- ence: the transitive rule has been replaced by a novel rule. This is the unique axiomatic system in the litera- ture that have not the transitive rule as primitive rule.
The novel rule named substitution rule was introduced with the idea to remove redundancy in sets of depen- dencies. This characteristic make this logic suitable to be used for directly reasoning. Other interesting characteristic of this rule is that can be considered as a transformation of equivalence [15].
In [15] we proved that SL axiomatic system is equiv- alent to other well known axiomatic systems [3, 9, 12, 17] and thus, all Paredaens derived rules are derived rules in SL.
To solve the implication problem we present an exten- sion of SL logic that we will name SLe logic. This basically consists in to extend the language of FFD Logic to allow empty left hand side formulae:
Definition 3.1 The SLe logic is the pair (Le, S) where Le = {X Y | X, Y ∈ 2Ω} and Ω is a set of attributes. The axiomatic system S consists in:
Axiom scheme: if Y ⊆ X 6= ∅ then
`S X Y Fragmentation rule: if Y0⊆ Y then
X Y `S X Y0 Composition rule:
X Y, U V `S XU Y V
Substitution rule: if X ⊆ U and X ∩ Y = ∅ then X Y, U V `S (U -Y ) (V -Y )
From now on, to use the standard notation used by database community, > will denote the empty set of attributes and XY the union of attributes.
The following theorem ensures the correctness and completeness of the method presented in the next sec- tion.
Theorem 3.2 (Deduction Theorem for FFD) 3 Given Γ ⊆ L, we have the following equivalence:
3See the proof of this theorem in [8].
1. For all X, U, V ∈ 2Ω,
Γ ∪ {> X} `Se U V iff Γ `S U X V And, in particular:
2. For all X, Y ∈ 2Ω. the following equivalence is stated:
Γ ∪ {> X} `Se > Y iff Γ `S X Y
4 A new automated deduction method
Now we are interested in the benefits of the extension that we have just consider.
In this section, a novel technique for applying in a sys- tematic way the system SLeis introduced. With this aim, three rewriting rules of simplification are defined using the symbol where Γ Γ0 means that all the elements in Γ must be replaced by all the elements in Γ0.
Definition 4.1 Given X, U, V ∈ 2Ω.
SC Simplification: If U ⊆ X then {> X, U V } {> XV }
SA Simplification: If V ⊆ X then {> X, U V } {> X}
S Simplification: {> X, U 7→V } {> X, U − X V − X}
Lemma 4.2 Let Γ and Γ0 be two sets of FFDs. If Γ0 is obtained from Γ applying the rewriting rules of simplification introduced in definition 4.1 then
Γ ≡Se Γ0
Proof.
SC Simplification:
{> X, U V }≡1Se {> X, U − X V − X}
≡2Se{> XV }
Substitution rule is applied in 1 and, since U ⊆ X, Composition rule is applied in 2.
SA Simplification:
{> X, U V }≡3Se {> X, U − X V − X}
≡4Se{> X}
Substitution rule is applied in 3 and, since V ⊆ X, the Axiom is applied in 4.
> X Simp.Rule ad c b e be cg bc g c a cd b cf bh cg af
> bd S ad c b e be cg bc g c a cd b cf bh cg af
> bd SC a c b e be cg bc g c a cd b cf bh cg af
> bde SC a c be cg bc g c a cd b cf bh cg af
> bcdeg SA a c bc g c a cd b cf bh cg af
> bcdeg SC a c c a cd b cf bh cg af
> abcdeg SA a c cd b cf bh cg af
> abcdeg S a c cf bh cg af
> abcdeg SC a c f h cg af
> abcdefg SA a c f h
> abcdefg SC f h
> abcdefgh
Figure 1: Table of the example 1
S Simplification:
{> X, U V }≡5Se {> X, U − X V − X}
Substitution rule is applied in 5.
Theorem 4.3 Given Γ ⊆ L and X Y ∈ L. If Γ0 is obtained from Γ ∪ {> X} applying the rewriting rules of simplification introduced in definition 4.1 while these rules can be applied then there exists a unique
> Z ∈ Γ0 with X ⊆ Z and
Γ `S X Y if and only if Y ⊆ Z
Proof. Firstly, there exist > Z ∈ Γ0 with X ⊆ Z because we apply the rewriting rules to Γ ∪ {> X}
and, when these rules modify X, X increases.
The uniqueness of > Z is ensured by SC rule.
Y ⊆ Z implies Γ `S X Y is obtained using Theorem 3.2, Lemma 4.2 and the fragmentation rule.
Conversely, the following steps prove that Γ `S X Y implies that Y ⊆ Z. Let Γ00 be Γ0− {> Z}:
1. Since > Z is unique, Γ00⊆ L.
2. If Γ `S X Y then Theorem 3.2 ensures that Γ0 `Se > Y and, from Γ0 = {> Z} ∪ Γ00, (1) and Theorem 3.2, Γ00`S Z Y is obtained.
3. If U V ∈ Γ00 then U ∩ Z = ∅ and V ∩ Z = ∅, since otherwise S rule of simplification could be applied.
4. If Γ00 `S Z Y then Y ⊆ Z because, due to (3), Z Y must be an axiom.
The above theorem states the method to determinate if Γ `S X Y . The solution arose from adding the goal > X to Γ, rendering an initial Γ0. Then, rewrit- ing rules of simplification are applied to Γ0 obtaining {> Z} ∪ Γ00. Finally, Γ `S X Y if and only if Y ⊆ Z.
Below, we solve the implication problem using our new methodology:
Example 4.1 Let Γ = {ad c, b eh, be c, bc d, c a, cd b, ce af, cf bdh} be this set of FFDs.
In order to know whether Γ ` bd ah, firstly we initialize Γ0 = Γ ∪ {> bd} render- ing: Γ0 = {> bd, ad c, b eh, be c, bc d, c a, cd b, ce af, cf bdh}
The table in figure 1 shows step by step how the rewrit- ing rules of simplification are applied. Note that the underscore points the FFD that is being reduced. The second column shows the applied rule:
Since ah ⊆ abcdef gh and by Theorem 4.3, the follow- ing deduction is obtained:
Γ |= bd ah
A novel algorithm for solving the implication problem using rules of simplification defined below is shown in figure 2. The algorithm simply adds > X and, in an exhaustive way applies the rules of simplification based on the theoretical study (Theorem 3.2).
Implies?(Γ, X → Y )=
Y es, if Y ⊆ Closure(X, nil, Γ, 1);
N o, otherwise.
Closure(X, Γ1, Γ2, b)=
X, if Γ2= nil or b = 0;
Closure(Simplify(X, Γ1, Γ2, 0)), otherwise.
Simplify(X, Γ1, nil, b) = (X, nil, Γ1, b) Simplify(X, Γ1, U → V :: Γ2, b) =
=
Simplify(X, Γ1, Γ2, b), if V ⊆ X;
Simplify(XV, Γ1, Γ2, 1), if U ⊆ X and V 6⊆ X;
Simplify(X, U -X → V -X :: Γ1, Γ2, b), otherwise.
Figure 2: Algorithm to solve implication problem
Since every step adds at least one attribute, in the worst case, the “Closure” loop is repeated at most | A | times. The “Simplify” loop is repeated at most | Γ | times. Consequently, the complexity of the algorithm is O(| A || Γ |).
We emphasize the following characteristics of the al- gorithm :
• The algorithm has the same complexity as the previous algorithms [7, 17] cited in literature, namely linear with regard to the input.
• Contrary to these previous algorithms, our algo- rithm has a solid base, since it uses the SL logic.
Consequently proofs and explanations are given automatically by the algorithm applying directly the logic SL. Namely, the trace shown in col- umn Simp.Rule reflects the rules of simplification based on SL logic that must be applied in order to prove the implication and the order in which the rules need to be applied.
5 Conclusions and future work
We have illustrated the difficulties of directly using other previous logics for dependencies to face up to the implication problem. We have shown how Substitution rules allow the development of automatic deduction methods.
None of the classical logics for dependencies can solve the implication problem efficiently without using in- direct methods. The algorithm we have proposed in this paper has the same complexity as typical indirect methods but using directly FFD SL. Thus, we can reason and we may built explanations. So, this new
algorithm is more appropriate to be used in an artifi- cial intelligence environment.
The most efficient algorithms that solve the implica- tion problem were based on complex indirect tech- niques instead of using theoretical advantages. The use of these indirect techniques obscures the nature of the method. On the contrary, the structure of our algorithm is notably simple.
In a future work, we will use SL to solve some inter- esting problems in the fuzzy databases, like removing redundancy, normalization, canonical closure, etc.
We will also study the extension of the technique we have shown in this work to more general definitions of FFDs, for example considering fuzziness not only in the data but in the dependency itself [22].
References
[1] Gabriel Aguilera, Pablo Cordero, Manuel Enciso, Angel Mora, and Inmaculada P. de Guzm´an. A non-explosive treatment of functional dependen- cies using rewriting logic. Lecture Notes in Arti- ficial Intelligence, 3171: 31–40, 2004.
[2] William W. Armstrong. Dependency structures of data base relationships. Proc. IFIP Congress.
North Holland, Amsterdam, pages 580–583, 1974.
[3] Paolo Atzeni and Valeria De Antonellis.
Relational Database Theory. The Ben- jamin/Cummings Publishing Company Inc., 1993.
[4] Edgar F. Codd. Recent investigations into rela- tional data base systems. IFIP Congress, Esto- colmo, Suecia, 1974.
[5] Pablo Cordero, Manuel Enciso, Inmaculada P. de Guzm´an, and Angel Mora. Slfd logic: Elimina- tion of data redundancy in knowledge representa- tion. Lecture Notes in Artificial Intelligence 2527, Springer-Verlag, pages 141–150, 2002.
[6] J.C. Cubero, M.A. Vila. A new definition of fuzzy functional dependency in fuzzy relational databases. Internat. J. Intell. Systems, 9 (5):
441–448, 1994.
[7] Jim Diederich and Jack Milton. New methods and fast algorithms for database normalization. ACM Transactions on Database Systems, 13 (3):339–
365, 1988.
[8] Manuel Enciso, Gabriel Aguilera, Pablo Cordero, Angel Mora, and Inmaculada P. de Guzm´an.
Logic-based Functional Dependencies Program- ming. Technical Reports. Departamento de Matem´atica Aplica. Universidad de M´alaga.
http://enciso.lcc.uma.es/TechnicalReports/ip.pdf , 2008
[9] Ronald Fagin. Functional dependencies in a re- lational database and propositional logic. IBM.
Journal of research and development, 21 (6):534–
544, 1977.
[10] Ronald Fagin. Multivalued dependencies and a new normal form for relational databases. ACM TODS 2, 1977.
[11] J. W. Guan and D. A. Bell. Rough computational methods for information systems. Artificial Intel- ligence, 105 1 (2):77–103, 1998.
[12] Toshihide Ibaraki, Alexander Kogan, and Kazuhisa Makino. Functional dependencies in horn theories. Artificial Intelligence, 108 1-2:1–30, 1999.
[13] M. L. Lee, T. W. Ling, and W. L. Low. Designing functional dependencies for xml. Lecture Notes in Computer Science, Springer-Verlag, 2287:124–
141, 2002.
[14] Heikki Mannila and Kari-Jouko Raiha. Algo- rithms for inferring functional dependencies from relations. Data and Knowledge Engineering, 12 (1):83–99, 1994.
[15] Angel Mora, Manuel Enciso, Pablo Cordero, and Inmaculada P. de Guzm´an. The functional de- pendence implication problem: optimality and minimality. an efficient preprocessing transforma- tion based on the substitution paradigm. Lecture Notes in Artificial Intelligence, Springer-Verlag, 3040, 2004.
[16] Angel Mora, Gabriel Aguilera, Manuel Enciso, Pablo Cordero, and Inmaculada P. de Guzm´an.
A new closure algorithm based in logic: SLFD- Closure versus classical closures. Inteligencia Ar- tificial, Revista Iberoamericana de IA, 31 (10):
31–40, 2006
[17] Jan Paredaens, Paul De Bra, Marc Gyssens, and Dirk Van Van Gucht. The structure of the rela- tional database model. EATCS Monographs on Theoretical Computer Science, 1989.
[18] H. Prade, C. Testemale. Generalizing database realtional algebra for the treatment of incomplete or uncertain information and vague queries. In- formation Sciences, 34: 115–143, 1984.
[19] H.V.S.V.N. Raju, A.K. Mazumdar. Fuzzy depen- dencies and losless join descomposition of fuzzy relational database systems. ACM Transaction of Database Systems, 13 (2): 129–166, 1988.
[20] Iztok Savnik and Peter A. Flach. Bottom-up in- duction of functional dependencies from relations.
Proc. of AAAI-93 Workshop: Knowledge Discov- ery in Databases, pages 174–185, 1993.
[21] E. Sciore. A complete axiomatization of full join dependencies. J. ACM, 29 (2):373–393, 1982.
[22] Mustafa Ilker Sozat and Adnan Yazici. A com- plete axiomatization for fuzzy functional and mul- tivalued dependencies in fuzzy database relations.
Fuzzy Sets and Systems, 117 (2):161–181, 2001.
[23] B.K. Tyagi, A. Sharfuddin, R.N. Dutta, Deven- dra K. Tayal. A complete axiomatization of fuzzy functional dependencies using fuzzy func- tion. Fuzzy Set and Systems, 151: 363–379, 2005.