Partial mutual information (PMI) - Partial information (PI)

5. Summary of connectivity methods and their applications in

5.2. Functional connectivity (FC)

5.2.3. Partial information (PI)

5.2.3.1. Partial mutual information (PMI)

Finally, from partial information, a more complex but valuable variable can be directly derived, the partial mutual information (PMI; Table 5.1), also called conditional mutual information (Hlavackova-Schindler et al., 2007, May et al., 2008). Whereas PI depends on only two variables or signals, partial mutual information includes a minimum of three variables (May et al., 2008). In order to motivate its application, suppose that for two main variables X and Y (representing signals extracted from two different channels or regions of interest in the brain), the application of MI has rendered a high value, that in addition has been statistically validated (for instance, by means of surrogates, surpassing the cut-off for significance in their distribution). However, the correlation found does not truly guarantee the existence of a direct coupling between X and Y, as confounding variables (denoted by Z), could be in fact the responsible agent in the coupling (e.g., Z is a major hub in the brain networks, driving the synchronization of X and Y simultaneously). Thus, MI answers the question of detecting statistical relationships between different channels (even better than other methods such as coherence, because it also takes into account non-linear couplings), but is not able to tell if the nature of the link is direct or caused by confounding variables (Frenzel and Pompe, 2007, Chan et al., 2013). In order to assure that the observed link reflects an actual interaction, PMI between X and Y with respect to Z could be employed, and compared with the original value of MI between X and Y. Then, if the values obtained in both cases were the same (i.e., a non-significant statistical difference), it would be indicative of no involvement of Z (the opposite is also true). The latter is the same procedure also followed with partial coherence (a linear counterpart of partial mutual information, where classical coherence assumes the role of mutual information) (Astolfi et al., 2007, Frenzel and Pompe, 2007).

Basic properties of the PMI are the following: as can be deduced from Fig. 5.2, PMI is either equal or smaller than MI (for example, in Fig.

5.2 PMI(X;Y|Z) is either equal or smaller than the intersection between H(X) and H(Y), that would correspond to MI(X;Y)). A relevant theorem is that if PMI reaches its theoretical inferior limit (nearly zero) and yet MI is significantly greater than zero, this result entails that variable Z completely reduces the uncertainty for the description of the interactions between channels X and Y, hence, all the information regarding the interactions among these channels is implicitly included in Z (May et al., 2008). Therefore, Z is responsible for the interactions between X and Y: they might be really synchronized, displaying functional connectivity, but there is no causal or effective connectivity among them. Thus, if PMI is not significantly different than MI, then variable Z is not acting as a confounding variable but is rather independent, whereas if PMI is significantly reduced, it provides a direct indication of Z being directly involved in the observed correlation between X and Y (Frenzel and Pompe, 2007).

Fig. 5.2. Venn diagram illustrating partial mutual information between X and Y, given Z, PMI(X;Y|Z) in the figure.

A subtle but important point in the previous discussion should be noted, in order to prevent mislead reasoning about functional connectivity using PMI: the fact that Z does not lower sufficiently MI between X and Y does not prove that X and Y are necessarily causally related. In this regard, three reasons might be considered to explain this possibility without entailing a contradiction: 1) The channel Z considered is not the right confounding variable (it has to be replaced with other brain region(s)). 2) The actual driver acting as a confounding variable was not recorded, thus, it will remain hidden from detection. 3) Instead of a single brain region, well localized, acting as a principal source of synchronization, it could also be a network spread through many cortical areas, acting in an integrated fashion: individually, particular channels Z would not appear necessarily to be substantially involved, but integrating its action in an additive form in the context of a network, total contribution of the network could act as an “effective” confounding variable in the connectivity link between X and Y.

In summary, the main utility of PMI is that it can be used to detect confounding variables affecting functional connectivity, rendering it an extremely valuable tool for connectivity investigations. Moreover, it automatically incorporates the detection of non-linear trends, hence, theoretically outperforming methods created for the same ends such as partial coherence. PMI might be computationally more demanding with an increasing number of channels and mathematically less straightforward to calculate or code.

There are two main ways to estimate PMI. Firstly, directly from the definition, as it can be computed combining the value of Shannon entropy of all the variables involved (the main variables X and Y, and the presumed confounding Z) and the mutual information of the pair of interest (X and Y) (Le Van Quyen et al., 2001, May et al., 2008). In this approach, the construction of several histograms in order to estimate the probability distribution functions is needed. An alternative method to compute PMI, independent of the somehow arbitrary construction of histograms for Shannon entropy, has been studied by (Frenzel and Pompe, 2007). In this approach, a three-dimensional vector must be constructed, whose components represent the values of X, Y and Z at a

given time. Over time, a cloud of points will hence be produced, each corresponding to a time sample. For a given point, a sphere with radius equal to the distance between this point and its k nearest neighbors (with k typically from a few up to around 30, but is very robust towards variations) is built (see further details in (Frenzel and Pompe, 2007)).

Although the latter method does not depend on a number of bins, it might be difficult to implement and requires more computational power. However, the authors insist on the advantages rendered with this alternative procedure, giving, on the contrary than histograms, an unbiased estimator for PMI and MI, even with a relatively reduced amount of time samples (around 1,000 samples, or 8 s sampled at 125 Hz) (Frenzel and Pompe, 2007).

Variable name (Abbreviation) Brief description

Shannon entropy (H) A measure related to probability theory to quantify the content of information of a signal, commonly expressed in bits. The more

random and irregular the signal, the greater its Shannon entropy.

𝐻(𝑋) = − � 𝑃(𝑥) ∙ 𝑙𝑙�𝑃(𝑥)�

Cross-mutual information 𝑥

(CMI)

Non-linear functional connectivity measure between a pair of distinct signals, symmetric under

permutation. A generalization of the cross-correlation function to include non-linear correlations.

𝐶𝐶𝐼(𝑋, 𝑌) = � 𝑃(𝑥, 𝑦)

𝑥,𝑦

∙ 𝑙𝑙 � 𝑃(𝑥, 𝑦) 𝑃(𝑥) ∙ 𝑃(𝑦)�

Auto-mutual information (AMI) Single channel measure, dependent on one signal. It is similar to CMI when using the original signal and a delayed version as the second signal. A generalization of the auto-correlation function to include non-linear correlations, but only assuming positive values. The initial slope of decay of AMI can be used to quantitatively

characterize the “complexity” (or unpredictability) of a signal.

𝐴𝐶𝐼�𝑋(𝜏)�

= � 𝑃(𝑥_𝑡,𝑥_𝑡+𝜏)

𝑥_𝑡,𝑥_𝑡+𝜏

∙ 𝑙𝑙 � 𝑃(𝑥_𝑡,𝑥_𝑡+𝜏) 𝑃(𝑥_𝑡)∙ 𝑃(𝑥_𝑡+𝜏)�

Partial information (PI) A measure to describe the Shannon entropy of a signal X when the part of its dynamics shared with another signal Y is removed. If it is not significantly smaller than Shannon entropy of X, it implies no correlation

between these two signals. It is not symmetrical in its arguments.

𝑃𝐼(𝑋|𝑌) = − � 𝑃(𝑥, 𝑦)

𝑥,𝑦

∙ 𝑙𝑙 �𝑃(𝑥, 𝑦) 𝑃(𝑦) � Partial (or conditional) mutual

information (PMI)

A measure to discriminate the existence of possible confounding variable signals between signals previously known to be correlated by CMI. When PMI including a third channel Z is similar to CMI between X and Y, then Z is not involved in the coupling;

otherwise, it explains partially the coupling due to indirect

interactions related with variable Z.

𝑃𝐶𝐼(𝑋, 𝑌|𝑍)

= � 𝑃(𝑥, 𝑦, 𝑧)

𝑥,𝑦,𝑧

∙ 𝑙𝑙 �𝑃(𝑥, 𝑦, 𝑧) ∙ 𝑃(𝑧) 𝑃(𝑥, 𝑧) ∙ 𝑃(𝑦, 𝑧)�

Variable name (Abbreviation) Brief description Transfer entropy (TE)

(a measure of effective connectivity)

Non-linear effective connectivity measure between a pair of signals, non-symmetric in its arguments. A generalization of Granger

causality in the context of information theory, to include non-linear causal interactions.

𝑇𝑇(𝑋 → 𝑌)

= � 𝑃(𝑦_𝑡+1,𝑦_𝑡,𝑥_𝑡)

𝑦_𝑡+1,𝑦_𝑡,𝑥_𝑡

∙ 𝑙𝑙 �𝑃(𝑦_𝑡+1|𝑦_𝑡,𝑥_𝑡) 𝑃(𝑦_𝑡+1|𝑦_𝑡) �

Table 5.1: Brief description of information theory derived functional connectivity and other important measures needed in this context from.

The last one (transfer entropy) is a measure of effective connectivity.

In document The sleep onset transition: a connectivity investigation built on EEG source localization (Page 186-193)