The concept of ‘causality’ - Mutual information based measures on complex interdependent networ

In his book I Am A Mathematician [106], Norbert Wiener wrote that

“ .. If we can measure degrees of causality ... We can then observe how much a change in one aspect of the universe will bring out changes in others.”

Wiener implied in his speech (and later in his book and papers) that it is possible to quantify ‘causality’ by virtue of quantifying the changes in a certain variable that incites changes in another. Wiener had the idea that the ‘causality’ of a variable in relation to another can be measured by how well the variable helps to predict the other. In other words, variable

Chapter 3. The question of ‘causality’ 41 A ‘causes’ variable B if the ability to predict B is improved by incorporating information about A in the prediction of B. Moreover, Wiener also said in [106] that

“.. I was forced to consider the theory of information, and above all, that partial information which our knowledge of one part of the system gives us the rest of it.”

The multi-talented Wiener is considered to be one of the pioneers of information theory. He believed that information theory could really contribute to detecting ‘causality’ and uncovering hidden information.

3.1.1 Different point of views on ‘causality’

From a philosophical point of view there has never been a clear agreement on what could be defined as ‘causality’, for an interesting review of the mathematical theory of causation from a philosophical point of view refer to [50]. Some philosophers even hold the view that ‘causality’ is impossible to quantify [45,52].

Statisticians often meet with ‘causality’ when dealing with correlation coefficient and regression [45]. Granger has written a review (mainly intended for econometricians) about the concept of ‘causality’ [46]. A more recent overview of causal related statistics albeit in a slightly different area is written by Judea Pearl [80] who is known as one of the pioneers of the Bayesian networks. According to him, the recent statistical ideas are moving away from traditional statistical analysis and more towards causal analysis. He differentiates between these two by saying that traditional statistical analysis focuses more on describing the data and inferring distribution parameter from samples while causal analysis requires explicit articulation of the underlying causal assumptions which is not what Bayesian statistician normally do.

In Bayesian statistics (the name derived from Bayes theorem for conditional probabil- ity), graphical models are often used. Graphical models are probabilistic models denoting conditional independence structure between random variables. In [80], Pearl proposes us- ing Structural Causal Model (SCM) to define causal quantities, causal assumptions and all the other concepts needed in a causal discourse. SCM is an extension on the Structural

3.1 The concept of ‘causality’ 42 Equation Modelling on linear systems. Granger admitted that it is possible to incorporate a more Bayesian viewpoint to the idea of ‘causality’ by incorporating dynamics of prior beliefs in the model [46]. One thing all the methods mentioned previously have in common is that one will first need to fit a model to the data in order to extract the ‘causality’ and that most of the models are essentially linear or at least based on a linear model. The model-free quantifications of ‘causality’ seem to have their root in information theory.

In agreement with Pearl that causal statistics is one of the most important statistics, [52] summarizes the information-theoretic and dynamical systems approach to causality. The paper explains that the link between these two fields is due to the fact that many of the approaches to inferring causality from experimental time series came about from studying synchronization of chaotic systems where the Shannon’s entropy definition has been adopted to study dynamical systems in the ergodic theory [59]. Various information- theoretic functionals have been used to estimate, classify and and explain chaotic data [8,52].

3.1.2 The arrow of time and prediction

Despite all those differing views on ‘causality’ even the philosophers [17,87,50] agree on the fact that the causal variable must come before the affected variable. As far as we know, the future cannot cause the past and the arrow of time persists. Hence, there must exist a certain time lag however small between the cause and the effect, this will be henceforth referred to as the causal lag [44]. Granger himself said that the flow of time clearly plays a central role and there is no use attempting to discuss ‘causality’ without time.

Another recurring theme is the use of prediction in ascertaining whether or not the causal variable has unique information about the affected variable which implies that we can infer ‘causality’ by comparing predictions. Consequently, we outline standard steps of inferring ‘causality’ derived from Wiener’s idea, Granger’s formulation and the basic assumption that the knowledge of the causal variable helps forecast the affected variable. It is this definition of ‘causality’ that we will adopt in this thesis. Say we want to test whether variable Y causes variable X. The first step would be to predict the current value of X

Chapter 3. The question of ‘causality’ 43 historical values ofY and X are both used to predict the current value of X. And the last

step would be to compare the former to the latter. If the second prediction is judged to be better than the first one, then one can conclude thatY causes X.

In document Mutual information based measures on complex interdependent networks of neuro data sets (Page 40-43)