• No results found

Quantifying Errors

3.2 Aggregation Errors in Communicability Calculation

3.2.3 Quantifying Errors

To quantify the errors introduced with the time aggregation of temporal networks (summarised in Table3.1), we calculate the dynamic communicability matrix Q for the fixed-time and true partitions across three types of synthetically generated temporal networks.

To build a temporal network from a static network G = (V, E) (where V is the set of vertices and E the set of edges), at each iteration we randomly pick an edge (i, j) ∈ E

and assign it a time corresponding to the iteration number k. This results in a temporal event (i, j, k) which forms part of the temporal network. The iteration can be repeated until the desired number of temporal events have been created. Allowing time to evolve discretely has the advantage that we know how many events occur within a time window and also that no two events can co-occur. In this assessment we will consider three network structures: a complete graph (all-to-all), an ensemble of Erdős-Rényi (ER) graphs, and an ensemble of ER-like acyclic graphs, examples of which are shown in Figure 3.3. The inclusion of the acyclic graph leads to an acyclic temporal network, allowing us to understand the effects of loops in the fixed partition scheme without creating an acyclic partition.

For each temporal network generated in this fashion we calculate the dynamic communicability matrix Q, and the broadcast vector b at the end of the temporal network2, using the true partition and fixed intervals of varying width. We could

similarly repeat the analysis for the receive vector, however the results would be identical.

1If two events occur at the same time but do not share any nodes then they can be ordered arbitrarily,

as far as the calculation of communicability centrality is concerned.

2Since we do not calculate the running communicability, i.e., β = 0, the final broadcast score gives us

Aggregation Errors in Communicability Calculation 51

(a) Complete Graph (b) Erdős-Rényi Graph (c) Acyclic Graph

Figure 3.3: Static network choices which are used to generate temporal networks by repeated sampling of edges (with replacement).

The restrictions on the parameter α for the fixed-width partition require it to be chosen carefully so that the fixed-width and true partitions can be compared. The restriction on α is dependent on the window size ∆t, and so to use a constant value we need to choose the smallest α across all values of ∆t. This leads to a small value of α such that the broadcast vector is strongly correlated with the aggregated node degree. In order to address this we pick a fixed value for α and truncate the matrix resolvent to ensure convergence. For a fixed-width partition of width ∆t∈ Z we use the truncated resolvent

(I− αA)−1

∆t

k=0

αkAk.

This ensures that if a dynamic walk of length ∆t occurs over the ∆t events in the partition then it will be captured. This means that the set of walks counted using the true partition is a subset of walks counted in the fixed-width partition communicability and so we can assess how many extra walks are being counted using the fixed-width partition.

To measure the differences between the different partitions we compare the broadcast vectors using the Pearson correlation coefficient, and Spearman’s rank correlation coefficient. The Pearson correlation between two variables X and Y is given by

ρp(X, Y ) =

Cov(X, Y )

σXσY

where Cov(X, Y ) is the covariance Cov(X, Y ) = E [(X − µX)(Y − µY)], and µi

between the variables X and Y . Here, X and Y are the broadcast vectors calculated using the different methods. The correlation ρp ∈ [−1, 1] takes values ±1 for total

positive/negative correlation and zero if there is no linear correlation3. Quite often we

are interested only in the relative rankings of nodes. For this purpose the Spearman’s rank correlation, given by

ρs(X, Y ) = ρp(rg(X), rg(Y ))

where rg(X)i is the rank of the raw value Xi in the vector, is the most suitable.

For this study we consider static graphs with 200 vertices, and the number of edges is variable between graph types. We generate temporal networks by drawing 1000 samples from the static network, with replacement. The results, averaged over 1000 graphs per graph type, are given in Figure3.4. Naturally we see good agreement across

Figure 3.4: Average correlation coefficients ρp (blue), ρs(green) between the broadcast

vectors of the fixed-width partition of width ∆t and true partition over an ensemble of 1000synthetic networks. a) The complete graph, b) ER graphs with parameter p = 0.3, c) acyclic graphs. Here the downweighing parameter α = 0.5. Data points are plotted with circles, and the 5th and 95th percentiles are given by the error bars.

all graphs for ∆t sufficiently small as the two methods converge. For the complete graph and the ER graphs we see a similar drop off in correlation as ∆t increases, both in the Pearson and Spearman correlations. For the acyclic graph however we see very

3Note that the two variables can have very strong non-linear dependence (e.g. Y2+ X2 = 1) and

Efficient Calculation 53

little drop in the correlation between the partitions, and in particular the Spearman’s rank correlation remains above 0.95 even when the temporal network is partitioned into three (∆t = 300). This makes it apparent that the inclusion of infinite walks in the communicability calculation for fixed-width partitions is the primary difference between the two partitions. This echoes the earlier notion that the two are measuring fundamentally different quantities, observed and imaginary walks through the network. For a more robust analysis we could also consider other correlation measures such as Kendall’s tau coefficient, or studying the correlation between only the top 10% highest ranked nodes. In applying the communicability metric to real examples of thousands of nodes often only the highest rank nodes are analysed. Therefore, repeating this analysis on only a subset of the broadcast vector should inform whether the errors in communicability appear uniformly across the nodes, or that higher or lower rank nodes are most affected.