Computational Intelligence - Computational intelligence in e mail traffic analysis

While there are a number of definitions used to describe computational intelligence [61, 62], the definition used in this thesis is to describe a type of approach for analysing e-mail traffic behaviour. Computational intelligence is defined here as an approach for using a set of computational techniques, to extract information from data and present the information to the user/analyst in a useful and intelligent manner. The purpose of defining this approach is to provide the user/analyst with a more overall understanding of e-mail traffic behaviour. This is considered important, given the variety of behaviours that can be analysed from e-mail traffic and the different levels of abstraction that may be used for analysing e-mail traffic (as previously mentioned in Chapter 2). Because of the complexity of information available for the user to analyse, the aim of using computational intelligence is to reduce the effort required by the user to understand the e-mail traffic data being studied. However, before the user can understand the data, one needs to consider what computational techniques may be used to extract information from the data.

Computational Techniques

The term “computational techniques” is defined here as techniques of extracting information from data in regard to the data properties. The purpose of using computational techniques is to supply the user with useful knowledge about the data. The computational techniques that may be used for computational intelligence, include any type of data analysis technique from the areas of:

• Statistics - mathematical techniques that are used for summarising and

interpreting large amounts of data [63].

• Visualisation - computational techniques that provide information about

the data by transforming the data into visual images, and allowing the user to explore and understand the data visually [5, 64].

• Artificial Intelligence - computational techniques that perform tasks that

would require “intelligence” if it were performed by humans [65, 66]. The types of computational techniques that would be considered “intelligent”, would be those that are able to learn and understand data, or make deci- sions based on information contained in the data [66]. Examples of “intel-

ligent” tasks performed are: classifying objects, learning and recognising patterns, prediction, or clustering objects into groups.

The computational techniques from each of the above areas provide different ways of analysing the data, with each presenting a certain perspective of the data to the user. For instance, some statistical techniques are able to describe the general characteristics of the data to the user (e.g. mean, standard deviation), while some artificial intelligence techniques inform the user about the presence of patterns in the data (e.g. artificial neural networks [66, 67]). However, each type of computational technique can only provide a limited perspective of the data to the user, meaning that the user cannot obtain an overall understanding of the data by using one particular type of technique. Therefore, consideration needs to be given towards using a set of computational techniques to provide the user a more overall understanding of the data.

Using Sets of Computational Techniques

The purpose of using sets of computational techniques for computational intelligence is to provide the user a variety of perspectives about the data and allow the user to compare those perspectives. This is important given that each computational technique used can only provide limited information about the data in relation to the type of analysis it can perform. By using a set of computational techniques, each technique provides information about a particular aspect of the data. The user can then use the information provided by each of the computational techniques to gain a better overall understanding of the data, as illustrated in Figure 3.1. General Characteristics Clustering of Objects Recognition of Known Patterns . . . . . etc.

Data The user observes and_{compares different} outputs to gain a better overall understanding of the data

Figure 3.1: Extracting and comparing different types of information about the data.

Sales Data for Fruit

Type Quantity Price

Sold Date Sold

Figure 3.2: Simple example of multi-dimensional data.

Another important point for using sets of computational techniques is for deal- ing with multi-dimensional data. Much of the data analysed is often multi- dimensional and may contain a large number of variables (e.g. hundreds of variables) that describe particular characteristics of the data [68]. As a simple example of multi-dimensional data, Figure 3.2 shows a concept map of the dimensions associated with the sales data for a fruit store. Each of the dimensions in Figure 3.2 (Type, Date Sold, Quantity Sold, and Price) indicate particular at- tributes of the data. The benefit of using a set of computational techniques is that each technique may be used to analyse a certain number of dimensions, so that the whole set covers a range of data dimensions.

Analysis of E-mail Traffic Behaviour

The purpose of using computational intelligence for e-mail traffic analysis is to provide the user/analyst a set of perspectives for analysing e-mail traffic behaviour. This can be approached by assigning a number of computational techniques to analyse e-mail traffic behaviour at different levels of analysis, for example: overviewing the behaviour of a selection of e-mail accounts or examining the behaviour interactions between pairs of e-mail accounts. This can also include using particular techniques to pinpoint unusual or abnormal behaviour, in order for the user/analyst to find these from large amounts of data. The overall effect of using a set of computational techniques is that it provides a variety of ways for the user/analyst to analyse and understand the behaviour of suspect e-mail accounts.

For this research, two types of computational techniques are utilised. The first type, visualisation techniques, is used to enable the user to explore and understand the behaviour of selected e-mail accounts. The second type, feature extrac-

tion techniques, is used to aid the user with locating unusual or abnormal changes in traffic behaviour exhibited by suspect e-mail accounts. Both visualisation and feature extraction techniques are used as a set, to provide the user/analyst different perspectives on the behaviour of suspect e-mail accounts. The diagram in Figure 3.3 illustrates each of the perspectives presented by the computational techniques used. The visualisation techniques used provide different levels of analysis for analysing e-mail traffic behaviour, while the feature extraction techniques provide ways of pinpointing unusual or abnormal changes in behaviour. The remainder of this chapter describes how each of these techniques is used to analyse e-mail traffic behaviour.

Decision Tree Classification

(Find unusual variations in e-mail traffic behaviour)

Hierarchical Fuzzy Inference

(Find abnormal changes in e-mail traffic behaviour, based

on the fusion of multiple behaviour measurements)

Social Network Visualisation

(Overview of connections between e-mail accounts)

Feature

Extraction

Techniques

Time-Series Visualisation

(Variations in volume of traffic exchanged between

e-mail accounts)

Visualisation

Techniques

Complementary Techniques

Figure 3.3: Computational techniques used for e-mail traffic analysis.

In document Computational intelligence in e mail traffic analysis (Page 74-77)