• No results found

D

Tid Trees 1 (0, 1, 2, 3, 2) 2 (0, 1, 2, 2, 1) 3 (0, 1, 2, 3, 1)

Figure 8.1: Example of dataset

1 2 3

12 13 23

123

Figure 8.2: Galois lattice of closed trees from the dataset example

8.2 Itemsets association rules

Let I = {i1, . . . , in} be a fixed set of items. A subset I ⊆ I is called an

itemset. Formally, we deal with a collection of ordered transactions D = {d1, d2, . . . dn}, where each diis an itemset. Figure8.3shows an example of

a b c d d1 1 1 0 1 d2 0 1 1 1 d3 0 1 0 1

Figure 8.3: Example of a dataset of itemsets transactions

An association rule is a pair (G, Z), denoted G→ Z, where G, Z ⊆ I and G⊆ Z. When (G) = Z for an itemset G 6= Z and G is minimal among all the candidates with closure equal to Z, we say that G is a generator of Z.

We are interested in implications of the form G→ Z, where G is a gen- erator of Z. These turn out to be the particular case of association rules where no support condition is imposed but confidence is 1 (or 100%) Such rules in this context are sometimes called deterministic association rules.

For example, from the itemsets dataset in Figure8.3we could generate a deterministic association rule c→ bcd, since c is a generator of the closed set bcd as mentioned above.

8.2.1 Classical Propositional Horn Logic

We will review briefly some important notions of classical propositional Horn Logic, following [Gar06]. First, assume a standard propositional logic language with propositional variables. The number of variables is finite and we denote by V the set of all variables; we could alternatively use an infinite set of variables provided that, the propositional issues correspond- ing to a fixed dataset, only involve finitely many of them. A literal is either a propositional variable, called a positive literal, or its negation, called a neg- ative literal. A clause is a disjunction of literals and it can be seen simply as the set of the literals it contains. A clause is Horn if and only if it contains at most one positive literal. Horn clauses with a positive literal are called definite, and can be written as H→ v where H is a conjunction of positive lit- erals that were negative in the clause, whereas v is the single positive literal in the clause. Horn clauses without positive literals are called nondefinite, and can be written similarly as H→ 2, where 2 expresses unsatisfiability. A Horn formula is a conjunction of Horn clauses. In Figure8.4the set of all variables is V ={a, b, c, d} and ¯a ∨ ¯b ∨ d or a, b → d is a Horn clause.

A model is a complete truth assignment, i.e. a mapping from the vari- ables to{0, 1}. We denote by m(v) the value that the model m assigns to the variable v. The intersection of two models is the bitwise conjunction returning another model. A model satisfies a formula if the formula evalu- ates to true in the model. The universe of all models is denoted by M. For

8.2. ITEMSETS ASSOCIATION RULES M a b c d m1 1 1 0 1 m2 0 1 1 1 m3 0 1 0 1 a→ b, d (a¯∨ b) ∧ ( ¯a ∨ d) d→ b ¯d∨ b a, b→ d a¯ ∨ ¯b ∨ d

Figure 8.4: Example of Horn formulas

example, in Figure8.4

m(a) = 0, m(b) = 1, m(c) = 1, . . . is a model.

A theory is a set of models. A theory is Horn if there is a Horn formula which axiomatizes it, in the sense that it is satisfied exactly by the models in the theory. When a theory contains another we say that the first is an upper bound for the second; for instance, by removing clauses from a Horn formula we get a larger or equal Horn theory. The following is known, see [DP92], or works such as [KKS95]:

Theorem 10. Given a propositional theory of models M , there is exactly one min- imal Horn theory containing it. Semantically, it contains all the models that are intersections of models of M . Syntactically, it can be described by the conjunction of all Horn clauses satisfied by all models from the theory.

The theory obtained in this way is called sometimes the empirical Horn approximation of the original theory. Clearly, then, a theory is Horn if and only if it is actually closed under intersection, so that it coincides with its empirical Horn approximation.

The propositional Horn logic framework allows us to cast our reason- ing in terms of closure operators. It turns out that it is possible to exactly characterize the set of deterministic association rules in terms of proposi- tional logic: we can associate a propositional variable to each item, and each association rule becomes a conjunction of Horn clauses. Then:

Theorem 11. [BB03] Given a set of transactions, the conjunction of all the deter- ministic association rules defines exactly the empirical Horn approximation of the theory formed by the given tuples.

So, the theorem determines that the empirical Horn approximation of a set of models can be computed with the method of constructing determin- istic association rules, that is, constructing the closed sets of attributes and identifying minimal generators for each closed set.