Identifying Links in the Network - Constructing Correlation Networks for SuperMAG

Chapter 2 Constructing Correlation Networks for SuperMAG

2.3 Identifying Links in the Network

We wish to establish the extent of similarity between temporal variations of the vector time-series of magnetometer stations. The magnetometer time-series are inherently non-stationary as there are many long term trends present as well as bursty activity (e.g. substorms) and as such they bear resemblance to the geophysical time-series used to establish networks in a climate context. Radebach et al. [2013] lists a number of methods that have been used in the construction of climate networks. Examples include linear (Bravais-Pearson)

correlation [Tsonis and Roebber, 2004], (cross-)mutual information [Donges et al., 2009], a phase synchronization index based on the normalized Shannon entropy of the associated phase diﬀerence time-series [Yamasaki et al., 2009], the (cross-)mutual information of order patterns [Barreiro et al., 2011], event synchronization [Malik et al., 2012] or transfer entropy [Hlinka et al., 2013]. Details of their use in a climate context can be found in the above references and we briefly outlined mutual information in chapter 1. Parameters used to establish connections in climate networks are typically scalars such as temper- ature, pressure and rainfall. Here we have the vector time-series measurements of the magnetic field. While a single component of the magnetic field could in principle be used there are significant drawbacks to this; the magnetic components, which vary in response to ionospheric currents, depend on the relative spatial position of the magnetometer station with respect to the currents. Us- ing a single component means we may miss correlated pairs of signals that are in orthogonal components.

Canonical correlation [Brillinger, 1975] oﬀers a potential solution to this. Jackel et al. [2001] has already investigated its use with respect to magnetometer station pairs. Canonical correlation defines a new coordinate sys- tem X′ = [X₁′(t), X₂′(t), X₃′(t)], Y′ = [Y₁′(t), Y₂′(t), Y₃′(t)] for a given vector time-series pairX(t), Y(t) in which the cross-correlation coeﬃcientrX₁′,Y₁′ be-

tween first canonical componentsX₁′(t) and Y₁′(t) is maximised. HereX′(t) =

RXX(t),Y′(t) =RYY(t) and RX andRY are the respective rotation matrices. These matrices are conceptually thought of as a rotation matrix but

it includes stretching and shearing, hence, Det(R) ̸= 1. In addition the

cross-correlation between the orthogonal canonical componentsrX1,Y2, rX1,X2, rY1,Y2... is zero, which is to say the covariance and cross-covariances matrices

for the rotated datasets are all diagonal. Determining the canonical cross- correlation coeﬃcients and the rotation matrices involves finding the eigenvalues and eigenvectors for the following matrices: CX=

∑₋1 XX ∑ XY ∑₋1 YY ∑ YX and CY = ∑₋1 YY ∑ YX ∑₋1 XX ∑

XY by solving the equations below

(CX−λXiI)aXi = 0 (2.3)

and

(CY−λYiI)aYi = 0 (2.4)

where covariance matrices ∑_XX, ∑_{Y Y} and cross-covariance matrices ∑_XY

and ∑_{Y X} are defined as ∑_XY =,

      E[(X1(t)−µX1) (Y1(t)−µY1)] E[(X1(t)−µX1) (Y2(t)−µY2)] E[(X1(t)−µX1) (Y3(t)−µY3)] E[(X2(t)−µX2) (Y1(t)−µY1)] E[(X2(t)−µX2) (Y2(t)−µY2)] E[(X2(t)−µX2) (Y3(t)−µY3)] E[(X3(t)−µX3) (Y1(t)−µY1)] E[(X3(t)−µX3) (Y2(t)−µY2)] E[(X3(t)−µX3) (Y3(t)−µY3)]      ,

with similar definitions for ∑_XX, ∑_{Y Y} and ∑_{Y X}. AboveµX is the expecta-

tion value of X, E[X] = µX. The eigenvalues relate to the cross-correlation

coeﬃcients for the canonical components, λX_i′ = r2X_i′Y_i′, λY_i′ = rY2_i′X_i′ and aX′_i

andaY_i′, are the eigenvectors that form the rows of the rotation matrices RX′

and RY′ respectively. The canonical correlation coeﬃcients obey the relation

r_Y′

1X1′ ≥rY2′X2′ ≥rY3′X3′.

To identify connections in the network we only use the first canonical component of our rotated dataset. Other information such as the relative contribution of each of the original components to the respective canonical components may hold useful information, however, using this goes beyond the scope of what we wish to accomplish. Unlike Pearson correlation, which allows

for anti-correlation, canonical coeﬃcients can only take values 0≤r_Y′

iX′i ≤1.

In eﬀect, any anti-correlated variations are rotated into positive correlations. We use canonical correlation between the windowed (the length of which depends on the aims of the analysis) segments of pairs of vector magnetometer time series to quantify similarity between pairs of stations as a function of time. The time-series are de-trended with a linear fit within each corre-

lation window. We calculate the canonical correlation between the ith and

jth station for all possible station pairs to form a cross-correlation matrix (or weighted network), Cij(t). Cij(t) contains the correlation coeﬃcient for the

first canonical component for each station pair and each windowed segment.

The Cij matrix could be used as a weighted adjacency matrix for the

network instead of a binary adjacency matrix. However, many of theCij values

are a result of “random correlation” and do not constitute physically related measurements, that is, it is dominated by noise. A “random correlation” here means obtaining a correlation coeﬃcient from a pair of time series that is likely to of occurred by chance (based on a presupposed statistical significance). We outline how we determine the false positive rate in section 2.4.3. By using a

threshold CT we can obtain a binary adjacency matrix Aij which, given an

appropriate choice forCT, will contain less noise.

In document A network analysis of space weather (Page 75-78)