Geometrically Inspired Itemset Mining in the Transpose
CHAPTER 4. GEOMETRICALLY INSPIRED ITEMSET MINING IN THE
4.4 Item-vector Framework
In section 4.1, the example of frequent itemset mining (FIM) was used to intro- duce the ideas behind this work. However, the work in this chapter is more general than this and the instantiations ofg(·),◦ andf(·)are straightforward for FIM. The
functions and operator formally described in this section dene the form of interest- ingness measures and data-set transformations that are supported by the GLIMIT algorithm. Not only can existing measures be mapped to this framework, but it is the author's hope that the geometric interpretation will inspire new interesting itemset mining approaches.
Recall that xI0 is the set of transaction identiers of the transactions containing
the itemset I0 ⊆ I. Call X the space spanned by all possible xI0. Specically, X=P({t.tid:t∈T}).
Denition 4.1. g : X → Y is a transformation on the original item-vector to a dierent representationyI0 =g(xI0) in a new space Y.
Even though g(·) is a transformation, it's output still `represents' the item vector.
CHAPTER 4. GEOMETRICALLY INSPIRED ITEMSET MINING IN THE
TRANSPOSE 77
Denition 4.2. ◦ is an operator on the transformed item-vectors so that yI0∪I” = yI0◦yI” =yI”◦yI0.
That is,◦is a commutative operator for combining item vectors to create item vectors
representing larger itemsets. It is not required that yI0 =yI0 ◦yI0 3.
Denition 4.3. f : Y → R is a measure on itemsets, evaluated on transformed
item-vectors. Write valueI0 =f(yI0).
Denition 4.4. interestingness: P(I) →Ris an interestingness measure (order)
on all itemsets.
Suppose a measure of interestingness of an itemset depends only on that itemset. The simplest example is support. It is possible to represent this as follows, where I0={i1, ..., iq} andk= 1:
(4.1) interestingness(I0) =f(g(x{i1})◦...◦g(x{iq}))
So the challenge is, given an interestingnessmeasure, nd suitable and useful g,◦
and f so that the above holds. For support, ◦ = ∩, f =| · | and g as the identity function. Let us return to the frequent itemset mining motivation. First assume that g(·) trivially maps xI0 to a binary vector. Using x{1} ={t1, t2} and x{5} ={t1, t3}
from gure4.1(a)we havey{1}=g(x{1}) = 110and y{5}=g(x{5}) = 101. It should
be clear that using bit-wise AN D as ◦ and f = sum() the number of set bits
gives the requires semantics for frequent itemset mining.
To give a motivation for these ideas, notice that sum(y{1}AN D y{2}) =sum(y{1}.∗
y{2}) = y{1}·y{2}, the dot product (.∗ is the element-wise product4). That is, the
dot product of two item-vectors is the support of the the 2-itemset. What makes this interesting is that this holds for any rotation about the origin. Suppose we have an arbitrary 3×3 matrix R dening a rotation about the origin. This means we can deneg(x) =RxT because the dot product is preserved by R (henceg(·)). For
example,σ({1,5}) =y{1}·y{5} = (RxT{1})·(RxT{5}). Therefore, it's possible to perform
an arbitrary rotation of the item-vectors before mining itemsets of size2. Of course
3Equivalently,◦may have the restriction thatI0
∩I” =∅. 4(a.∗b)[i] =a[i]∗b[i]for alli, where[]indexes the vectors.
78 4.4. ITEM-VECTOR FRAMEWORK this is much more expensive than bit-wiseAN D, so why would one want to do this? Consider Singular Value Decomposition. If normalisation is skipped, it becomes a rotation about the origin, projecting the original data onto a new set of basis vectors pointing in the direction of greatest variance (incidentally, the covariance matrix calculated in SVD also denes the support of all 2-itemsets5). If it is also used for dimensionality reduction, it has the property that it roughly preserves the dot product. This means it should be possible to use SVD for dimensionality reduction and or noise reduction prior to mining frequent 2-itemsets without introducing too much error. The drawback is that the dot product applies only to two vectors. That is, we cannot use it for larger itemsets because the `generalised dot product' satisessum(RxT{1}.∗RxT{2}.∗... .∗RxT{q}) =sum(x{1}.∗x{2}.∗... .∗x{q}) only for
q= 2. However, this does not mean that there are not other useful ◦,f(·),F(·) and
interestingness measures that satisfy Equation 4.1 and useg(·) =SV D, some that perhaps will be motivated by this observation.
Note that the transpose operation is crucial in applying dimensionality or noise reduction because it keeps the items intact. If the data were not transposed, the item- space would be reduced, and the results would be in terms of linear combinations of the original items, which cannot be interpreted meaningfully. It also makes more sense to reduce noise in the transactions than items.
Other options forg(·) are set compression functions or approximate techniques, such
as sketches, which give estimates rather than exact values of support or other mea- sures. However, the author believes that new geometrically inspired measures will be the most interesting. For example, angles between item-vectors are linked to the correlation between itemsets. Of course, it is also possible to translate existing measures into the framework.
To complete the framework, the family of functionsF(·) is dened as follows:
Denition 4.5. F : R|P(I
0)|
→ R is a measure on an itemset I0 that supports any composition of measures (provided by f(·)) on any number of subsets of I0. WriteV alueI0 = F(valueI0
1, valueI 0 2, ..., valueI 0 |P(I0)|) where valueI 0 i = f(yI 0 i) and all Ii0 ∈ P(I0).
It is now possible to support more complicated interestingness functions that require more than a measure on one itemset:
5That is,C
CHAPTER 4. GEOMETRICALLY INSPIRED ITEMSET MINING IN THE