4.3 Analysis of model performance
4.3.3 Input/output transformation: model production rules
units. They define the long-term memory of the model that permits the spatiotemporal differentiation of the inputs stimuli. However, to study the relationships between inputs and model predictions it is also required to analyse the data flow within the model. Due to the recurrent links, the input information can be recombined and reflected in the activity of different hidden units. In order to gain insight about the inputs propagation through the model, it is necessary to use a measure of shared activity in the input and hidden layers. Sometimes, the weights between two units are of small magnitude, but the respective correlation between their activities is high. The opposite may also occur.
While the weights pertain to the unique contribution of each variable, the correlations between the units’ activity represent the overall contribution. For example, if two variables (such as loudness and texture), contain redundant information, then the model may rely less on one of them, because it only
“needs” to include one of the items to capture the essence of what they measure. Once a large weight is assigned to one of the variables, the contribution of the second item may be redundant and, consequently, it will receive a smaller (or even negligibly) small weight (for instance, Texture is many times related to the loudness level since more notes sounding or instruments playing frequently relate with increased loudness). Nevertheless, by looking at the correlations between the inputs and the hidden units’ activity, those may be substantial for both. To reiterate, the weights pertain to the unique contributions of the respective variables with a particular weighted sum (or processing unit activity).
In order to account for the temporal dynamics of the model, the correlations between inputs, hidden and output units were computed using a Canonical Correlation Analysis (CCA) (Hotelling, 1936). A canonical correlation is the correlation of two canonical variables: one representing a set of independent variables, the other a set of dependent variables. The CCA optimises the linear correlation between the two canonical variables to be maximised in the context of many to many relationships. There may be more than one linear correlation relating the two sets of variables, each representing a different dimension of the relationship, which explain the relation between them. For each dimension it is also possible to assess how strongly it relates each variable in its own set (canonical factor loadings). These are the correlations between the canonical variables and each variable in the original data sets. While the lesioning tests facilitated the selection of groups of hidden units with strong relationships to the output, this analysis aims at investigating the model’s internal dynamics and its correlations with the inputs and outputs.
The CCA is used to assess the relationships between the sequences of input, hidden and output layers activity. This method permits the analysis of the contribution of each network layer node or (sets of nodes) to the activity of a different layer. Relevant for the analysis are the relationships between input
and hidden layers (how the inputs relate with the internal representations of the model), and these with the outputs (which sets of hidden units are more related to the output). In Table 4.6 the details of a CCA for the activity of the neural network layers are shown.
Loadings (Input/Hidden)
Variable var. 1 var. 2 var. 3
H1 -0.398 -0.633 -0.028 H2 0.479 0.657 -0.437 H3 0.144 -0.891 -0.238 H4 0.159 -0.647 -0.632 H5 -0.637 0.645 0.018 T 0.264 0.478 0.151 T x 0.608 0.280 0.217 D 0.450 0.674 0.139 P 0.819 0.297 0.432 T i 0.748 0.420 0.262 C 0.187 0.270 0.825 Canon Cor. 0.725 0.546 0.448 Pct. 61.1% 23.4% 13.8% Loadings (Hidden/Output) Variable var. 1 var. 2
H1 -0.504 0.482 H2 0.978 -0.055 H3 -0.291 0.862 H4 0.014 0.797 H5 -0.074 -0.973 A 0.765 -0.644 V 0.260 0.966 Canon Cor. 0.987 0.984 Pct. 66.0% 44.0% Table 4.6: Canonical Correlation Analysis (CCA): the canonical correlations (the canonical correlations are interpreted in the same way as the Pearson’s linear correlation coefficient) quantify the strength of relationships between the extracted canonical variates, and so the significance of the relationship. To assess the relationship between the original variables (input, hidden and output units activity) and the canonical variables, the canonical loadings (the correlations between the canonical variates and the variables in each set) are also included.
The bigger the loading, the strongest relationships between the original variables (input, hidden, and output units’ activity) and the canonical variates. The following paragraphs summarise these relationships, which explain how the network inputs are propagated through the hidden layer to the network’s outputs.
Input to hidden: Three canonical variables explain 98.3% of the variance in the data (see left side of Table 4.6). The first pair of variables loads on P, Tx, Ti (inputs set), H2 andH5 (hidden layer). The second, loads only on input D, but it
H2andH4. These 3 dimensions encode the general levels of shared activation in
the input and hidden layers.
Hidden to output: Two canonical variables explain all the variance in the data (see right side of Table 4.6). The first root is correlated strongly with arousal, and the activity in hidden unitsH1andH2. The second pair of canonical variables correlates with both valence (positive) and arousal (negative), and with the activity in unitsH3 toH5.
By taking these 2 groups of relationships together it is possible to establish qualitative patterns of correlations illustrative of the general model dynamics. The lesioning tests facilitated the selection of groups of hidden units with strong relationships to the output, while the CCA has shown how the model’s internal dynamics correlates with the inputs. By using both analyses the output units’ activity can, symbolically, be represented as8:
A(t) =g1(−H1(t), H2(t),−H3(t)) +m1(M1(t), M2(t), M3(t), M5(t)) (4.1)
A(t) = g1(T(t), T x(t), L(t), P(t), S(t)) +m1(H1(t−1), H2(t−1), H3(t−1), H5(t−1))
(4.2) At a given time (t), arousal (A(t)) is positively associated with the current T, Tx, L, P and S inputs, plus the memory of previous states (except for the H4
dimension). Applying the same principle to valence leads to:
V(t) =g2(−H2(t), H4(t),−H5(t)) +m2(M3(t), M5(t)) (4.3) V(t) = g2(−H2(t), H4(t),−H5(t)) +m2(H3(t−1), H5(t−1)) (4.4) 8The signal of the canonical loadings indicates if a hidden unit reinforces or inhibits the outputs.
The hidden units considered correspond to the fundamental units to the output predictions found in the lesioning analysis, while the input units correspond to the strongest correlations with the hidden layer found with CCA.
Considering that H4 blocks all the inputs (this unit is only affected by the
memory layer units), and thatH2 was almost linearly related to arousal9, we can
further simplify:
V(t) = g2(−H5(t)) +m2(H1(t−1), H2(t−1), H3(t−1), H5(t−1)) (4.5)
V(t) =g2(T(t),−L(t), P(t), S(t)) +m2(H1(t−1), H2(t−1), H3(t−1), H5(t−1))
(4.6)