• No results found

Time-Series Visualisation

3.3 Visualisation Techniques

3.3.2 Time-Series Visualisation

Time-series visualisation is a type of computational technique that provides a one-dimensional view of data and presents a visual image of how the data changes over time. This type of visualisation is often used for statistical analysis of time- series data [76], whereby the plotted image provides the user information about the temporal variations of the data. The simple graph plot presented by time- series visualisation makes it easy for the user to interpret, since the data is sim- ply represented as a sequence of plotted values that displays important features of the temporal data (e.g. trends, outliers) [76].

The information presented by time-series visualisation is usually drawn as a se- ries of values plotted between two axes. The horizontal axis is used to represent the time variable, while the vertical axis is used to represent the variable or di- mension being analysed. The plotted values for the sampled data may be rep-

resented using notations such as: dots, connected lines, or columns. Figure 3.9 shows an example of different notations used for representing time-series data.

0 5 10 15 20 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Time (days) 0 5 10 15 20 25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Time (days) 0 5 10 15 20 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Time (days)

Figure 3.9: Example of different notations used for representing time-series data.

Analysing Time-Series Information in E-mail Traffic

The purpose of applying time-series visualisation to analyse e-mail traffic data is to examine the volume of e-mail messages exchanged between particular e-mail accounts over time. This is to provide the user/analyst information about the variations in the number of e-mail messages sent or received by e-mail accounts, as well as information about particular trends (e.g. rise in traffic volume or drop

in traffic volume). Another reason for using time-series visualisation is for ex- amining the relationship between particular e-mail accounts. This can enable the user to investigate time periods of intense or low traffic activities, as well as find unusual interactions between those e-mail accounts. As a result, this pro- vides another way of examining relationship information, in addition to social network visualisation (described previously in Section 3.3.1).

It should be noted that different time-scales (e.g. minutes, hours, days, weeks, or months) can be applied to the sampling of e-mail traffic data used for time- series visualisation. The advantage of using different time-scales is that it pro- vides different levels of granularity for analysing the data [77]. This provides the user/analyst with a variety of options for analysing the data, since each time-scale used may reveal different types of patterns through the resulting time-series plot. For example, the pattern of a person who sends e-mail messages only on certain days of the week, may be better noticed when the time-scale is adjusted to e-mail messages sent per day. Thus the use of different time-scales can aid the user in visually analysing the e-mail traffic data for a variety of temporal behaviours.

E-mail Traffic Data Transformation to Time-Series Data Mapping to Visual Representation Output onto Graphical Display Exploration of Data by the User Time-Series Visualisation Feedback Parameters from the User

Figure 3.10: Time-series visualisation process.

To create the time-series graph for time-series visualisation, the e-mail traffic data is processed using the steps shown in Figure 3.10. Firstly, the e-mail traffic data is transformed to extract temporal and traffic volume information, and sam- pled at a particular time-scale to produce the time-series data. An example of the resulting time-series e-mail traffic data is shown in Table 3.3. The time-series data is then processed using time-series visualisation and plotted as a time-series graph. The graph is then output onto graphical display, showing an image like those in 3.3 and 3.12. Both of these Figures were visualised using TimeSearcher 2 [78] under different time-scales. The resulting time-series graph shows that time-series visualisation is a useful technique for aiding the user/analyst to un- derstand the temporal aspects of e-mail traffic data.

Table 3.3: Example of time-series e-mail traffic data.

Week Number Number of Incoming Messages Number of Outgoing Messages 0 4 6 1 9 10 2 11 10 3 14 16 4 18 16 5 17 17 6 18 20 7 20 18 8 20 22 9 22 21 10 24 26 11 35 31 12 22 20 13 20 17 14 15 20

Figure 3.11: Example of e-mail traffic volume using a weekly time-scale.

Limitations of Time-Series Visualisation

While time-series visualisation is useful for displaying the temporal information associated with particular e-mail accounts, there are a number of limitations as- sociated with using time-series visualisation. Firstly, it is limited in its ability to provide an overview of connections between multiple e-mail accounts. This is because time-series visualisation only provides information about aspects of

Figure 3.12: Example of e-mail traffic volume using a daily time-scale.

the data that vary with time, but is unable to display more complex relational information such as those shown by social network visualisation.

Another limitation of time-series visualisation is that it is difficult to display in- formation for a large selection of e-mail accounts (e.g. more than 10 e-mail accounts). This is due to the fact that while multiple plots for several e-mail accounts may be displayed on the same time-series graph, the graph will gradu- ally become overcrowded and incomprehensible when the number of time-series plots is increased. This makes it difficult for the user to visually locate in the e-mail traffic data which e-mail accounts may be exhibiting unusual or abnormal traffic behaviour. It also makes it difficult for the user to search and determine which e-mail account may be of interest for detailed investigation.

Overall, what the social network and time-series visualisation techniques show is that these are useful methods for aiding the user/analyst to visually explore e-mail traffic data. While both of these techniques do have certain limitations, these limitations can be overcome by utilising both techniques to complement each other, hence using them as a set for computational intelligence. The ap- proach of utilising both visualisation techniques will be described in Section 4.4, and demonstrated in Sections 5.2 and 5.3. However, when searching for signs of unusual or abnormal behaviour, the exploration approach used by the visu- alisation techniques would take the user a great deal of time to look for those behaviours. This is because of the large amount of e-mail traffic data that may have to be examined and also the time and effort required for determining the

presence of unusual or abnormal behaviour. To aid the user with finding unusual or abnormal e-mail traffic behaviour, feature extraction techniques are consid- ered, to provide methods for quickly locating these types of behaviour.