• No results found

3.2 Austria 2004

4.1.2 Examining the simulated network

Before moving to the second step of populating our network, it is worth pausing to examine the kind of network generated by the algorithm above. The method we

used for generating the scale-free directed network following the work of Bollob´as et al produces results theoretically holding in the limit. While they applied their method to generate an actual network with the observed in-degree and out-degree distributions, they simulated the worldwide web graph, that is a network with more than 20 million pages (vertices) and about 1.5 billion links. As such, we need an intermediate step of verifying the validity of the simulated network particularly since we are dealing with less than a few hundred nodes.

Verifying the validity of the simulated network can be achieved in one of two ways:

• assessing the goodness of fit of the simulated network in relation to the hypo- thetical power–law distribution with the desired exponents;

• estimating the empirical distribution of the simulated network.

We believe the second approach to be the more appropriate one, particularly when considering the divergence between simulated and actual exponents given in the ex- ample of Bollob´as et al for the worldwide web graph [BBCR03]. We follow Clauset, Shalizi and Newman[CSN09], where they presented a framework for quantifying the power–law behaviour in multiple empirical data sets.

Let us start by assuming that the method we followed generated a network ex- hibiting power law in-degree and out-degree distributions. Our task is then to find the parameters of the distributions that best fit the data. Power–law distributions as previously indicated possesses two parameters: the exponent parameter or scaling, and the cut–off value, which we have not dealt with so far. If we assume for now that the cut–off kmin parameter value is known, the MLE of the exponent of the power law

distribution is X = 1 + n " n X i=1 ln ki kmin #−1

where ki, i = 1, . . . n are the observed values of k such that ki ≥ kmin which can be

easily calculated1. 1

Omneia R.H. Ismail – PhD Thesis – McMaster University – CES

In estimating the cut–off on the power–law distribution, we follow their method of choosing that cut–off value which makes the probability distribution of the measured data and the fitted data (with the exponent estimated above), as close as possiblev- above that cut–off value kmin. As such we proceed as follows

1. propose a cut–off kmin;

2. find the MLE for the exponent for that cut–off value;

3. for values above the cut–off, measure the distance between the simulated distri- bution and the estimated one using the Kolmogorov-Smirnov2 (KS) statistic

D = maxk≥kmin|S(k) − P (k)|,

where S(k) is the CDF (cumulative distribution function) of the data of obser- vations with a value at least kmin, and P (k) is the best fit power-law model for

the data in region k ≥ kmin.

4. estimate kmin which minimizes D (through comparing the different D values for

each of the proposed cutt-offs).

Before applying our approach to the networks simulating the Brazilian banking system for March 2008, it is worth noting that for such a small sample sizes of around one hundred nodes it becomes hard to distinguish between data drawn from the closely related power-law, log normal and exponential distributions, particularly as our fit is also controlled by the cut–off value. As stated in [CSN09], for samples of size of about a hundred “we can not accurately distinguish the data sets because there is simply not enough data to go on”.

The results for our simulated networks are reported in Table 4.2. For each of the three networks we report the estimated parameters, the KS statistics and the KS

bution to simplify our calculations.

2

For non-normal data, the Kolmogorov-Smirnov is the commonest used measure for quantifying the distance between two probability distributions [CSN09].

tabulated value3. In all cases we see that the KS statistics is less than KS tabulated

values, so we cannot reject the hypothesis that our data does actually follow a power– law distribution with the reported parameters. Finally in the last row entry of our table we report the number of data points above the cut–off value, that is the number of banks having degrees greater than kmin.

As obvious from the fitted values in the table, the case for network with α = 0.15 performs the worst, under a 1% level of significance. It produces a very high cut–off parameter for the out–degree/in–degree distribution, as only 20 data points in our network is above that cut–off for the out–degree, and 23 for the in–degree.

α = 0.15 α = 0.1 α = 0.05

In–deg Out–deg In–deg Out–deg In–deg Out–deg

Cut off (kmin) 3 7 3 3 6 5

Exponent 1.7903 5.1620 2.0386 2.4150 2.0055 2.3092

KS statistic 0.1209 0.1500 0.1750 0.1507 0.1042 0.1283 KS tabulated 0.30728 0.32688 0.2577 0.1908 0.2577 0.1870

Data> cut off 23 20 40 73 40 76

Table 4.2: Parameters estimated for power law fitting of the simulated networks.