Generalised Interaction Mining
CHAPTER 3. GENERALISED INTERACTION MINING 39 Mining maximal interactions can be eciently performed in GIM through detecting
and processing fringe nodes.
Denition 3.11. The fringe of a prex tree is the set of P ref ixN odes that corre- spond to interesting interactions and are not prexes of any other interesting inter- action. Nodes in the fringe are called fringe nodes.
Note that ifSI(·) =II(·) then the fringe is identical to the set of leaf nodes. Figure
3.3shows examples of fringe nodes.
The following lemmas are useful for mining maximal interactions:
Lemma 3.12. The set of all maximal interactions is contained in the fringe of a P ref ix T ree.
Proof. If this were not the case, there would exist a maximal interesting interaction that were a prex of another interesting interaction, providing a contradiction. It will be shown later that the GIM algorithm generates all sub-interactionsV00⊂V0 before generating V0 (lemma 3.18). As a consequence of lemmas 3.12 and 3.18, all maximal interactions can be mined by iterating over the fringe and discarding all those interactions that are subsets of nodes mined later in the algorithms process. Algorithm 3.2 shows an (on-line) incremental algorithm that performs this task as the interactions are mined. Note that the subset checking must be done in one direction only thanks to lemma 3.18. Furthermore, note that a new fringe node is guaranteed to be maximal, and may only be rendered a non-maximal interaction if a subsequently mined interesting interaction exists that subsumes it. In algorithm
3.2, addFringeNode(·) is called with the fringe nodes as they are generated. These
nodes are a subset of the nodes output by outputInteraction(·) in algorithm3.1,
and it is not dicult to modify algorithm 3.1 to be able to determine and hence provide a signal to outputInteraction(·) when a node is a fringe node. This can
be done in constant time. Details are omitted here for clarity. Note that the maximal interactions are stored eciently through the prex sharing of the prex tree. The issues of mining maximal interactions eciently is central to the patterns mined in chapter9 and will be discussed in more detail there.
40 3.6. INCLUDING NEGATIVE PATTERNS Algorithm 3.2 Incremental algorithm for maintaining the set of maximal interesting interactions.
//Data Structure
Set maximalInteractions=∅;
addFringeNode(P ref ixN ode f ringeN ode)
for each P ref ixN ode n∈maximalInteractions if n⊂f ringeN ode
maximalInteractions.remove(n); maximalInteractions.add(f ringeN ode);
3.6 Including Negative Patterns
Negative patterns typically describe relationships that include the explicit lack of events or objects. This means that not only is an objects presence important or interesting, but so is its absence. Such patterns are (in general) not the same as positive and negative relationships between variables this issue will be considered further in chapter 9. Consider an interaction pattern P1 = {a, c, d} where the set of variables areV ={a, b, c, d, e} and suppose, for simplicity, that interestingness is
dened by some co-occurrence measure. P1 says thata,canddoccur together in the database. It makes no statement about the presence or absence of the other objects bande. Indeed,bmay always occur when{a, c, d}occur, or never occur when these
objects occur. If b always occurs, this leads to the pattern P2 = {a, b, c, d} being found. However, if it never occurs when {a, c, d} occurs then this information is
not found unless negative patterns are considered. Negative patterns allows such information to be expressed; in particular, the previous example leads to the pattern P2 ={a,¬b, c, d}where¬denotes the absence (negated presence). Note thatP1 and P2 are not the same and express dierent knowledge about the database. Similarly, suppose that a and e never occur together. That is, when a is present, e is never present and vice versa. This is a potentially interesting interaction and can be expressed in two patterns{a,¬e}and {¬a, e} depending on how interestingaande are by themselves. In contrast, not including negative patterns only allows positive interactions to be found.
Mining negative patterns can be performed by in the GIM framework by rst in- cluding the negation of all variables in V. In the previous example, the variable set would then beV ={a, b, c, d, e,¬a,¬b,¬c,¬d,¬e}. Additionally, the interaction
vectors for a negated variable need to be dened. This can be done using a function: Denition 3.13. N : X → X computes the negated vector x¬v = N(xv) corre-
CHAPTER 3. GENERALISED INTERACTION MINING 41
(a) The fringe of the prex tree of gure3.2(a). (b) Fringe of the prex tree of gure 3.2(b)
Figure 3.3: The fringe of a prex tree is shown in grey in this gure. Here, SI(·) = II(·) so this corresponds to the leaf nodes.
sponding to the variable¬v.
Since a variable v and its negation ¬v can never occur together, there is no need to consider interactions containing both. GIM can be modied to avoid examining such cases in one of two ways. The simplest way is to employ a pre-pruning function that will be described later in this chapter (denition3.17). A more ecient method is to incorporate the categorised prex tree introduced in chapter 6 and place each variable in a category with its negated variable. Variables in the same category are considered mutually exclusive, and this can be exploited by modications to the algorithm that enable automatic pruning. Chapter 6 considers this in detail in the context of Generalised Rule Mining (GRM).
Note that it is not hard to avoid explicit storage of negated vectors in the actual algorithm. Usually it is easy and ecient to implement this using a Decorator design pattern [41] applied to the interaction vector, thus avoiding any additional usage of space.
42 3.7. SOLVING TOP-DOWN OR MONOTONIC PROBLEMS WITH GIM Example 3.14. In frequent itemset mining using bit-vectors as interaction vectors, N(·)simple ips all bits. Note that the anti-monotonic property holds when negative
items are included and as such the pruning technique functions identically to the positive item case.
Example 3.15. A real world example where negative patterns are of interest is presented in chapter 5. In that chapter, complex spatial co-location patterns were sought.
Example 3.16. In a toy example, suppose we wish to nd all possible algebraic expressions with operators−and + over the set of variables so that the expression
has a value in [a, b] and holds in at least minSup samples. For example, such an interaction may look like v1 +v2 −v3, and if this is evaluated over all samples / instances in the database, and evaluates to a value in [a, b] in at least minSup samples, then it is an interesting pattern. This can be solved in the GIM framework using an aggregation function aI(xV0, xv) dened so that xV0∪v[i] = xV0[i] +xv[i],
using the variable set that includes both the positive and negated variables (where N(xv)[i] = −xv[i]), mI(xV0) = |{i : xV0[i] ∈ [a, b]}|, MI(·) is trivial, II(·) returns
true if the number computed bymI(·) is at leastminSup, andSI(·) always returns true (note that this means there is no pruning3).
3.7 Solving Top-Down or Monotonic Problems with GIM
Due to the bottom up nature of the algorithm, where interactions are grown by adding additional variables to them, GIM is most suited to methods where the in- terestingness measure is anti-monotonic or partially anti-monotonic, as this enables ecient pruning of the search space particularly in sparse databases: if an inter- action is not interesting, larger interactions need not be considered. This section describes how monotonic problems can also be solved using GIM.It is possible to solve monotonic problems in GIM by inverting the original problem, thus producing an anti-monotonic problem. This means that rather than mining the interaction itself, the GIM algorithm mines the inverted pattern, from which the actual pattern sought can be recovered.
To illustrate this method, for simplicity consider the problem of mining infrequent patterns in a database where the absence of objects is meaningful. That is, nd 3This is just a toy example, where the primary goal it to illustrate a negative pattern, not how to
CHAPTER 3. GENERALISED INTERACTION MINING 43