Percolation Matrix Module - Predictive trend mining for social network analysis

In this section, the proposed Percolation Matrix Module is introduced. The module is intended to provide support for network analysis by indicating the probability of

Percolation Matrix Module 1 2 3 4 5 6 1 0.00 0.000.000.01 0.000.00 2 0.00 0.000.000.01 0.000.00 3 0.00 0.010.000.00 0.000.00 4 0.00 0.000.010.01 0.000.00 5 0.00 0.000.000.04 0.000.00 6 0.00 0.000.000.01 0.030.00

Patterns and Trends

Patterns and trends are identified using the

Trend Identification module Filter {93} {138, 156, 178, 219, 207, 205} {93 1} {138, 156, 177, 202, 164, 160} {93 54} {0, 0, 0, 16, 25, 15} {93 65} {0, 0, 0, 0, 0, 25} {94 1} {112, 109, 102, 136, 101, 111} Set of combination frequent patterns (FP)

Probability and Percolation Matrices of FP

Visualisation Module

Probability Maps using Visuset Geographical map using

Google Earth

Figure 6.2: Block Diagram Indicating The Prediction Modeling (PM) Process

information or events traveling between nodes in a network. This information can then be used to determine the probability of traffic flow between three or more nodes. As already noted in the work described in this thesis, the probability is derived from the identified trend data.

a c d b 0.1 0.1 0.1 0.2

Figure 6.3: Conceptual Example of the Percolation of Information and Events in a network fragment

In Figure 6.3 an example of a snapshot of the nodes and links in a network fragment is presented. The figure shows a network of four nodes labeled{a, b, c, d}, connected by four links. The links are annotated with the probability of traffic flowing along this link

at the given time stamp. Also, the links can be bidirectional. This information can also be interpreted as the probability of a node being directly connected to another node. Similarly, combinations of such probabilities can indicate indirect connections between nodes. Thus, referring to Figure 6.3, the probability that nodeais connected to node

b is given as 0.1. Thus there is a possibility of 0.1 that some piece of information or event occurring at nodeawill travel to nodeb. Similarly the probability that an event occurring at node awill be transmitted to nodedis 0.1×0.1 = 0.01.

This section consists of two sub-sections describing the Percolation Matrix module. In Sub-section 6.2.1, the process of filtering frequent pattern trends is described, then Sub-section 6.2.2 explains the process of transcribing the probabilities associated with a set of combination patterns to form the percolation matrix.

6.2.1 Filtering The Frequent Patterns

The Percolation Matrix module starts with the process of filtering frequent patterns and trends to be used in the PM. As mentioned in Sub-section 6.1.1, the frequent patterns of interest are combination patterns of the form: {Lf romLocation,M,LtoLocation}. The

process of generating the set of combination patterns of interest (F P) is dependent on the interest of the domain user. Typically the selection is based on some constraints to be applied so as to filter the global set of movement patterns. For example ifM = {m1, m2, m3} and the set of location values is {a, b, c, d} then the set of combination

patterns might be:

F P = {{a, m1, m2, m3, b},

{a, m1, m2, m3, c},

{b, m1, m2, m3, d},

{c, m1, m2, m3, d}}

The setF P and the associated trends are then used as input to the Percolation Matrix module.

6.2.2 Probability and Percolation Matrices

The second part of the Percolation Matrix module comprises a two stage processes: (i) determine the probability of link traffic in the set F P, and (ii) construct the desired

n percolation matrices for F P. As mentioned earlier, the trends for each f pi in F P

are used to compute the probability of link traffic. Then given a ntime stamp trend,

n percolation matrices will be generated. A percolation matrix consists of a N ×N

elements, whereN ={n1, n2, . . . , nn}is the number of possible location pattern values.

The magnitude ofNis dependent on the number of distinctLf romLocationandLtoLocation

contained in F P. The intersection of a row and column in the matrix indicates the probability value of associated link traffic.

Algorithm 6.1: The Probability and Percolation Matrix

input :F P = Set of Frequent combination patterns, set of Trends

output:nPercolation Matrices generated fromF P

for ∀f p∈F P do

Extract probability (p) of each f pfrom its associated trend;

end

for k←1 to|T rends|do

Construct a matrix of sizeN ×N);

for i←1 to|F P |do

Insert pi into the matrixk at the appropriate location; 7

end

Algorithm 6.1 describes the process of extracting the probability of information movement and building the percolation matrix to facilitate the desired Prediction Mod- eling. The algorithm first extracts the probability of each patternf p in F P (Line 2). The support values associated with each time stampndefines the probability of traffic flowing between nodes. Thus all support values for the selected frequent patterns are converted into a probability value (p). Therefore, given a specific frequent patternf pi,

conforming to some types of combination pattern, pi forf pi is defined as:

pi = support(f pi) P f pi (6.1) Thus,p1+p2+. . .+pn= 1.

Once the probability for all f p has been extracted, the algorithm constructs the percolation matrix (Line 3). As already noted, the size of the matrix is dependent on the number of available values forLf romLocationand LtoLocation. Then the probabilities

of traffic associated with allf pare inserted into the matrix. The process repeats until all n percolation matrices are constructed. Table 6.1 shows an example of the output of Algorithm 6.1 with respect to the network fragment present in Figure 6.3. These percolation matrices are then used as the input to the Visualisation module described in the next section.

From/To a b c d

a 0 0.1 0.1 0

b 0 0 0 0.1

c 0 0 0 0.2

d 0 0 0 0

Table 6.1: An example of a Percolation Matrix using the network fragment given in Figure 6.3

In document Predictive trend mining for social network analysis (Page 143-147)