• No results found

Geometrically Inspired Itemset Mining in the Transpose

CHAPTER 4. GEOMETRICALLY INSPIRED ITEMSET MINING IN THE

4.4 Item-vector Framework

In section 4.1, the example of frequent itemset mining (FIM) was used to intro- duce the ideas behind this work. However, the work in this chapter is more general than this and the instantiations ofg(·),◦ andf(·)are straightforward for FIM. The

functions and operator formally described in this section dene the form of interest- ingness measures and data-set transformations that are supported by the GLIMIT algorithm. Not only can existing measures be mapped to this framework, but it is the author's hope that the geometric interpretation will inspire new interesting itemset mining approaches.

Recall that xI0 is the set of transaction identiers of the transactions containing

the itemset I0 ⊆ I. Call X the space spanned by all possible xI0. Specically, X=P({t.tid:t∈T}).

Denition 4.1. g : X → Y is a transformation on the original item-vector to a dierent representationyI0 =g(xI0) in a new space Y.

Even though g(·) is a transformation, it's output still `represents' the item vector.

CHAPTER 4. GEOMETRICALLY INSPIRED ITEMSET MINING IN THE

TRANSPOSE 77

Denition 4.2. ◦ is an operator on the transformed item-vectors so that yI0I = yI0◦yI =yI◦yI0.

That is,◦is a commutative operator for combining item vectors to create item vectors

representing larger itemsets. It is not required that yI0 =yI0 ◦yI0 3.

Denition 4.3. f : Y → R is a measure on itemsets, evaluated on transformed

item-vectors. Write valueI0 =f(yI0).

Denition 4.4. interestingness: P(I) →Ris an interestingness measure (order)

on all itemsets.

Suppose a measure of interestingness of an itemset depends only on that itemset. The simplest example is support. It is possible to represent this as follows, where I0={i1, ..., iq} andk= 1:

(4.1) interestingness(I0) =f(g(x{i1})◦...◦g(x{iq}))

So the challenge is, given an interestingnessmeasure, nd suitable and useful g,◦

and f so that the above holds. For support, ◦ = ∩, f =| · | and g as the identity function. Let us return to the frequent itemset mining motivation. First assume that g(·) trivially maps xI0 to a binary vector. Using x{1} ={t1, t2} and x{5} ={t1, t3}

from gure4.1(a)we havey{1}=g(x{1}) = 110and y{5}=g(x{5}) = 101. It should

be clear that using bit-wise AN D as ◦ and f = sum() the number of set bits

gives the requires semantics for frequent itemset mining.

To give a motivation for these ideas, notice that sum(y{1}AN D y{2}) =sum(y{1}.∗

y{2}) = y{1}·y{2}, the dot product (.∗ is the element-wise product4). That is, the

dot product of two item-vectors is the support of the the 2-itemset. What makes this interesting is that this holds for any rotation about the origin. Suppose we have an arbitrary 3×3 matrix R dening a rotation about the origin. This means we can deneg(x) =RxT because the dot product is preserved by R (henceg(·)). For

example,σ({1,5}) =y{1}·y{5} = (RxT{1})·(RxT{5}). Therefore, it's possible to perform

an arbitrary rotation of the item-vectors before mining itemsets of size2. Of course

3Equivalently,may have the restriction thatI0

∩I” =∅. 4(a.b)[i] =a[i]b[i]for alli, where[]indexes the vectors.

78 4.4. ITEM-VECTOR FRAMEWORK this is much more expensive than bit-wiseAN D, so why would one want to do this? Consider Singular Value Decomposition. If normalisation is skipped, it becomes a rotation about the origin, projecting the original data onto a new set of basis vectors pointing in the direction of greatest variance (incidentally, the covariance matrix calculated in SVD also denes the support of all 2-itemsets5). If it is also used for dimensionality reduction, it has the property that it roughly preserves the dot product. This means it should be possible to use SVD for dimensionality reduction and or noise reduction prior to mining frequent 2-itemsets without introducing too much error. The drawback is that the dot product applies only to two vectors. That is, we cannot use it for larger itemsets because the `generalised dot product' satisessum(RxT{1}.∗RxT{2}.∗... .∗RxT{q}) =sum(x{1}.∗x{2}.∗... .∗x{q}) only for

q= 2. However, this does not mean that there are not other useful ◦,f(·),F(·) and

interestingness measures that satisfy Equation 4.1 and useg(·) =SV D, some that perhaps will be motivated by this observation.

Note that the transpose operation is crucial in applying dimensionality or noise reduction because it keeps the items intact. If the data were not transposed, the item- space would be reduced, and the results would be in terms of linear combinations of the original items, which cannot be interpreted meaningfully. It also makes more sense to reduce noise in the transactions than items.

Other options forg(·) are set compression functions or approximate techniques, such

as sketches, which give estimates rather than exact values of support or other mea- sures. However, the author believes that new geometrically inspired measures will be the most interesting. For example, angles between item-vectors are linked to the correlation between itemsets. Of course, it is also possible to translate existing measures into the framework.

To complete the framework, the family of functionsF(·) is dened as follows:

Denition 4.5. F : R|P(I

0)|

R is a measure on an itemset I0 that supports any composition of measures (provided by f(·)) on any number of subsets of I0. WriteV alueI0 = F(valueI0

1, valueI 0 2, ..., valueI 0 |P(I0)|) where valueI 0 i = f(yI 0 i) and all Ii0 ∈ P(I0).

It is now possible to support more complicated interestingness functions that require more than a measure on one itemset:

5That is,C

CHAPTER 4. GEOMETRICALLY INSPIRED ITEMSET MINING IN THE