• No results found

Results: Exploratory Data Analysis

Partial Protocol Results

7.2 Results: Exploratory Data Analysis

As a starting point, Figure 7.1 shows the 4-plot for an experiment where 5 agents were instantiated for this partial protocol scheme where all measurements are inmillseconds as obtained from a unix system call. Inspection of the plots can be used to test under-lying assumptions about the data. As it is, the figure shows the data to be non-normal as highlighted by the skewness in the histogram (c), fat tails and departures from the straight line in the normal probability plot (d). The data are random as there is no inherent structure in the lag plot (b). Inspecting the time series run plot (a) The data seems not to have a fixed variation when observed across long periods of time as was

done in the experiments and as plotted here.

0 200 400 600 800 1000 1200 1400 1600 1800

1500

(a) Run sequence plot, showing delays in mil-liseconds (y-axis) and experimental cycle (x-axis)

1500 2000 2500 3000 3500 4000

1500

(b) Lag plot, showing delays (x-axis and y-axis) in milliseconds

15000 2000 2500 3000 3500 4000

50

(c) Histogram, showing distribution of delays (y-axis) in milliseconds and bins (x-axis)

−4 −3 −2 −1 0 1 2 3 4

QQ Plot of Sample Data versus Standard Normal

(d) Normal probability plot, showing delays in milliseconds (y-axis) and quartiles (x-axis)

Figure 7.1: Figures a-d show elements of the 4-plots for the purposes of exploratory data analysis for centralised experiments using the partial protocol scheme where 5 agent were hosted. All measurements are in milliseconds.

The rest of the 4-plots for other experiments with agent numbers varied from 10 to 100 are given in Appendix in figures F.1 and F.2 from page 321. Again the figures there confirm the observations highlighted above.

Furthermore we can consider additional plots for descriptive statistics, such as the boxplot [234] to give the 5-number summaries3given in Table 7.1 in a graphical format for visual inspection and also to see outliers, and get a sense of the data dispersion.

Boxplots can also be used for comparison between datasets.

We can also present the plot of the cumulative density function that describes the prob-ability density function of random variableX.

x7→ FX(x) = P (X ≤ x) = Z x

−∞

f (t) dt (7.1)

As example consider Figure 7.2 showing the box plots and the cdf plots given together with the histogram and series data plots for the agent experiment where 5 agents were hosted. Again, a quick inspection confirms non normality and the presence of outliers.

The rest of the figures for other experiments are given in figures 7.3 to 7.4.

A note about those figures;

1. The parameter for agent numbers was chosen for the experiments to explore scalability as discussed in the experimental design, That is, we wish to explore how detection delays vary with an increase in the number of agents hosted in the agent platform, i.e. to explore whether it is linear , exponential, etc.

2. Regarding the histograms in the figures, the bin size chosen affects the visual appearance of the distribution of the data, therefore visual inspection alone is not sufficient to establish the underlying normality of the data. This can only be

3Minimum, Mean, Maximum,Median, Quartiles.

verified by standard normality tests [131, 222, 229]. The results of the normality tests4 are shown in Table 7.2

10000 2000 3000 4000 5000

50

Plot of Detection Delays over time

10000 2000 3000 4000 5000

0.2

Box plot showing 5 number summary

Figure 7.2: Showing plots for descriptive statistics and exploratory data anal-ysis for centralised experiments using the partial protocol scheme for the ex-periment with 5 agents hosted. For example, the Box plot shows the 5 number summary, e.g. Median of around 2200 ms, and upper and lower quartiles on either side of the Median, and Maximum and Minimum values. Also showing outliers. All measurements are in milliseconds.

4All quantities for normality test are standard and definitions can be found in the given references

0 2000 4000 6000 8000 10000

Plot of Detection Delays over time

0 2000 4000 6000 8000 10000 0

Box plot showing 5 number summary

(a) 10 agent experiment

Plot of Detection Delays over time

0 5000 10000 15000

Box plot showing 5 number summary

(b) 15 agent experiment

Plot of Detection Delays over time

0 0.5 1 1.5 2

Box plot showing 5 number summary

(c) 20 agent experiment

Plot of Detection Delays over time

0 0.5 1 1.5 2 2.5

Box plot showing 5 number summary

(d) 25 agent experiment

Figure 7.3: Figures (a)-(d) show plots for descriptive statistics and exploratory data analysis for centralised experiments using the partial protocol scheme where agent numbers were varied from 10 through to 25. All measurements are in milliseconds

0 1 2 3 4

Plot of Detection Delays over time

0 1 2 3 4

Box plot showing 5 number summary

(a) 30 agent experiment

Plot of Detection Delays over time

0 1 2 3 4 5

Box plot showing 5 number summary

(b) 40 agent experiment

Plot of Detection Delays over time

1 2 3 4 5

Box plot showing 5 number summary

(c) 50 agent experiment

Plot of Detection Delays over time

0 0.5 1 1.5 2 2.5

Box plot showing 5 number summary

(d) 100 agent experiment

Figure 7.4: Figures (a)-(d) show plots for descriptive statistics and exploratory data analysis for centralised experiments using the partial protocol scheme where agent numbers were varied from 30 through to 100. All measurements are in milliseconds

Table 7.1 presents summary statistics for all experiments in this setup. The table presents; location measures; to find a central value that describes the data, disper-sion measures; to capture the spread in the data; randomness measures ,distributional measures; The third and fourth moments are the skewness and kurtosis of the distri-bution. The table also complements 4-plot figures above by exploring properties of the detection delays data by presenting some distributional measures 5. For exam-ple, the probability plot correlation coefficient, PPCC [91] can be used to identify the shape parameter for a distributional family that best fits the data [181]. DATAPLOT

TMproduces PPCC values for the distributions shown under distributional measures in Table 7.1. Again, the distributional measures strongly point to the fact that the detec-tion delays data are not from a normal distribudetec-tion.

Regarding repeatability of experiments, how many experiments were carried out, de-tailed analysis of variance and comparisons between experiments, these are all dis-cussed in detail in chapter 9, but briefly, for each experiment, 10 experimental runs with each experimental run spanning an experimental period of about 10 hours parti-tioned into experimental cycles of about 10 mins.

For completeness, additional standard normality tests6were carried out and results are given in Table 7.2. All quantities for normality tests as presented in Table 7.2 are stan-dard and their definitions can be found in [131, 222, 229], together with interpretation of results. All tests show that the data is non-normal.

5To characterise properties of the data, e.g. shape.

6Most statistical tools implement these tests.