5.4 Illustrative Examples
5.4.3 Shopping pattern data
Table 5.14: Correlation matrix among the variables of the shopping pattern data X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X1 1.000 0.547 0.274 0.637 0.481 0.517 0.369 0.242 0.566 0.666 X2 0.547 1.000 0.650 0.837 0.809 0.792 0.562 0.277 0.808 0.768 X3 0.274 0.650 1.000 0.744 0.782 0.795 0.781 0.496 0.566 0.526 X4 0.637 0.837 0.744 1.000 0.851 0.772 0.609 0.609 0.815 0.775 X5 0.481 0.809 0.782 0.851 1.000 0.906 0.821 0.573 0.798 0.748 X6 0.517 0.792 0.795 0.772 0.906 1.000 0.781 0.418 0.736 0.702 X7 0.369 0.562 0.781 0.609 0.821 0.781 1.000 0.473 0.577 0.599 X8 0.242 0.277 0.496 0.609 0.573 0.418 0.473 1.000 0.484 0.424 X9 0.566 0.808 0.566 0.815 0.798 0.736 0.577 0.484 1.000 0.894 X10 0.666 0.768 0.526 0.775 0.748 0.702 0.599 0.424 0.894 1.000
hood. Telephone interviews were conducted to collect data from the members of the household who did the major food shopping for the household. The house- holds were selected by random sampling from households in a large northeastern metropolitan city. There were 10 regressors in their study.
Table5.14gives the correlation matrix of the regressors. There are a number of strong correlations, such as between X2 and X4, between X5 and X6 and between
X9 and X10, as well as a number of moderate correlations among the regressors.
Large pairwise correlation is a sufficient condition for collinearity, so we expect collinearities between some of the regressors.
Ofir and Khuri (1986) analyzed this dataset using VIFs, R2
j’s (R2j is the R2
value when regressing Xj on the remaining regressors) and the eigenvalues and
eigenvectors of the correlation matrix Rxx of the regressors.
Table 5.15 represents the results of collinearity diagnostic using the eigenvec- tors analysis. The expected squared distance between the least square estimates and the true parameter is 79.9σ2, which indicates that the least square estimates are about eight times inflated by multicollinearity (for orthogonal regressors the expected distance would be pσ2, i.e., 10σ2). This is one indication of collinearity
Table 5.15: Eigenvectors analysis for the shopping pattern data Eigenvector X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 v8 0.110 -0.247 0.035 0.151 0.041 -0.186 0.146 -0.178 0.683 -0.589 v9 -0.126 -0.525 -0.073 0.478 0.526 -0.087 -0.151 -0.294 -0.185 0.216 v10 0.213 0.070 0.337 -0.554 0.566 -0.356 -0.280 0.039 0.037 0.001 VIF 3.330 7.526 6.783 14.641 15.831 8.832 5.875 2.947 7.488 6.643 Note: P10 j=1 1 λj = 79.9
in the dataset. The eigenvalues of the correlation matrix are 6.897, 1.059, 0.739, 0.462, 0.349, 0.188, 0.129, 0.081, 0.064 and 0.032. The corresponding condition indices are 1.000, 2.552, 3.055, 3.866, 4.446, 6.056, 7.319,9.204,10.392and14.716, respectively. The last three eigenvalues are small compared to the others as well as have three large condition indices, so there are three collinear sets in the data. The κj’s indicate that one of them is moderate while the other two are weak. The VIFs
indicate that all the regressors except X1 and X8 are involved in a collinearity. On
the basis of v10, Ofir and Khuri (1986) concluded that the variables X4 and X5
are clearly collinear, and that perhaps X6 is involved in this collinearity. Although
Ofir and Khuri did not suggest it, the third element of the eigenvector indicates that X3 might also be included in this collinear set. The other two eigenvectors
(associated with next two smallest eigenvalues) identify one collinearity between X2, X4, and X5 and another one between X9 and X10.
The variance-decomposition proportions of shopping pattern data is given in Table 5.16. The last row of the variance-decomposition proportions indicates that X3, X4 and X5 are involved in one collinearity, and that perhaps X6 is involved
in this set. This collinear set is the same as the collinear set identified by the last eigenvector (v10). The second last row has only one large variance proportions
Table 5.16: Variance-decomposition proportion of shopping pattern data Principal Variance Proportion
Component X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 1 0.003 0.002 0.002 0.001 0.001 0.002 0.002 0.002 0.002 0.002 2 0.092 0.006 0.024 0.000 0.001 0.001 0.020 0.047 0.006 0.014 3 0.012 0.014 0.010 0.003 0.000 0.012 0.010 0.303 0.002 0.002 4 0.303 0.039 0.001 0.002 0.000 0.003 0.073 0.000 0.045 0.005 5 0.041 0.028 0.053 0.040 0.001 0.000 0.105 0.001 0.042 0.093 6 0.000 0.013 0.271 0.007 0.067 0.142 0.010 0.038 0.002 0.117 7 0.003 0.205 0.097 0.024 0.018 0.327 0.256 0.000 0.057 0.015 8 0.044 0.100 0.002 0.019 0.001 0.048 0.044 0.131 0.766 0.641 9 0.074 0.573 0.012 0.245 0.274 0.013 0.061 0.460 0.071 0.110 10 0.428 0.021 0.527 0.659 0.636 0.451 0.419 0.016 0.006 0.000
Table 5.17: Cos-max transformation matrix and VIFs of the shopping pattern data X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 VIF a>1 1.501 0.030 0.410 -0.712 0.282 -0.348 -0.177 0.095 0.080 -0.395 3.330 a>2 0.030 2.428 -0.125 -0.843 -0.506 -0.336 0.170 0.533 -0.392 -0.267 7.526 a>3 0.410 -0.125 2.179 -0.921 0.241 -0.652 -0.695 -0.071 0.165 0.078 6.783 a>4 -0.712 -0.843 -0.921 3.330 -0.794 0.316 0.502 -0.622 -0.330 -0.104 14.641 a>5 0.282 -0.506 0.241 -0.794 3.548 -1.085 -0.914 -0.353 -0.286 -0.030 15.831 a>6 -0.348 -0.336 -0.652 0.316 -1.085 2.611 -0.160 0.102 -0.215 0.006 8.832 a> 7 -0.177 0.170 -0.695 0.502 -0.914 -0.160 2.030 -0.093 0.066 -0.289 5.875 a>8 0.095 0.533 -0.071 -0.622 -0.353 0.102 -0.093 1.443 -0.182 -0.039 2.947 a>9 0.080 -0.392 0.165 -0.330 -0.286 -0.215 0.066 -0.182 2.459 -0.989 7.488 a>10 -0.395 -0.267 0.078 -0.104 -0.030 0.006 -0.289 -0.039 -0.989 2.310 6.643
X5 are small they could perhaps be collinear with X2. (The eigenvectors analysis
suggested a collinearity between X2, X4 and X5.) The variance proportions of X4
and X5 are dominated by the first collinear set. The third row from the bottom
shows that clearly X9 and X10 are collinear. This is also one of the finding from
the eigenvectors analysis.
Tables 5.17 and 5.18 present the transformation matrices of the cos-max and the cos-square transformations of shopping pattern data, respectively. From the components of a9 and a10 in both the cos-max and the cos-square transformation
Table 5.18: Cos-square transformation matrix and VIFs of the shopping pattern data X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 VIF a>1 1.434 -0.006 0.380 -0.815 0.317 -0.370 -0.161 0.063 0.063 -0.441 3.330 a>2 -0.004 2.388 -0.162 -0.941 -0.578 -0.364 0.142 0.408 -0.425 -0.281 7.526 a>3 0.325 -0.173 2.127 -1.039 0.309 -0.683 -0.668 -0.094 0.157 0.071 6.783 a>4 -0.519 -0.750 -0.773 3.465 -0.810 0.308 0.406 -0.453 -0.284 -0.070 14.641 a>5 0.189 -0.431 0.215 -0.757 3.677 -0.936 -0.695 -0.242 -0.227 -0.007 15.831 a>6 -0.273 -0.338 -0.590 0.358 -1.163 2.593 -0.185 0.065 -0.224 0.012 8.832 a>7 -0.142 0.157 -0.690 0.563 -1.032 -0.220 1.954 -0.100 0.055 -0.306 5.875 a>8 0.065 0.522 -0.112 -0.726 -0.416 0.090 -0.116 1.372 -0.223 -0.058 2.947 a> 9 0.048 -0.409 0.141 -0.342 -0.293 -0.232 0.048 -0.168 2.461 -0.978 7.488 a>10 -0.347 -0.277 0.065 -0.086 -0.009 0.013 -0.273 -0.045 -1.004 2.313 6.643
one of the results obtained from both the eigenvectors method and the variance- decomposition proportion method. Further information about the structure be- tween X2, X3, X4, X5, X6 and X7 is provided by the components of a2, a3, a4,
a5, a6 and a7. From a2, there is a suggestion that X2 and X4 are related. The
components of a3 indicate a relationship between X3 and X4. The large values in
a4 indicate that X4 is related with X2, X3, and X5. Similarly, a5 indicates that
X5 is related with X4, X6 and X7. A relationship between X6 and X5 is indicated
by a6, and a relationship between X7 and X5 is indicated by a7.
We only know of the existence of relationships between the variables but do not know their directions. By assigning directions to the relationships, we can generate some possible structures. From a2and a3, we can say X4 predicts X2and
X3 respectively. Similarly X5 predicts X6 and X7 in a6 and a7 rows respectively.
In addition to the directions assigned above, a4 and a5 can say X5 predicts X4.
Figure 5.1(a) is drawn to explain the above mentioned directional relationship. Figure 5.1 shows four of these possible structures. This is one of the advantages of using the cos-max and the cos-square transformation matrices. Each structure can yield the same correlation pattern, for suitable choices of pairwise correlations.
The only difference between Figures 5.1(a) and 5.1(c) is that in5.1(c) X2 and X3
predict X4. Whereas the difference between Figures 5.1(a) and5.1(d) is that both
X6 and X7 predict X5. Figure 5.1(b) assigns the reverse direction between X4
and X5 compared to Figure 5.1(d). This is the only difference between Figures
5.1(b) and 5.1(d). In Figure 5.1(a), there will be some correlation between X2
and X3 because of their dependence on X4. This is why there must be a weak
correlation between X6 and X7 in Figure5.1(b). This illustrates that the cos-max
and cos-square transformation matrices can suggest structures that might underlie the data. The structure between the variables cannot be inferred from the other two methods.
A simulation study was conducted to compare the methods using the structure given in Figure5.1(a). We generated 1000 data considering the simple case where one variable is regressed by only one regressor. We generated X5 from a normal
distribution and then generated X4, X6 and X7 from X5. We then generated X2
and X3 from X4. The beta coefficients and variances of the random error terms
to generate the simulated data were estimated from the shopping pattern data. (We have no real data, so we first generated 1000 data from an MVN distribution using the correlation matrix of the shopping pattern data.) Table 5.19 gives the correlation matrix for the generated data. Using this matrix, we applied the above mentioned three methods of collinearity diagnostics to examine whether we can identify the structure that generated the data.
We first apply the eigenvectors method. The eigenvalues of the correlation matrix of the simulated data are 5.559, 0.193, 0.116, 0.089, 0.030 and 0.013. The corresponding condition indices are 1.000, 5.363, 6.908, 7.903, 13.685 and
Table 5.19: Correlation matrix of the simulated data X2 X3 X4 X5 X6 X7 X2 1.000 0.884 0.948 0.908 0.894 0.853 X3 0.884 1.000 0.932 0.893 0.878 0.843 X4 0.948 0.932 1.000 0.958 0.944 0.901 X5 0.908 0.893 0.958 1.000 0.985 0.933 X6 0.894 0.878 0.944 0.985 1.000 0.917 X7 0.853 0.843 0.901 0.933 0.917 1.000
Table 5.20: Eigenvectors analysis of simulated data Eigenvector X2 X3 X4 X5 X6 X7
v5 0.337 0.240 -0.865 0.010 0.284 0.021
v6 -0.016 -0.018 0.172 -0.791 0.581 0.079
VIF 9.808 7.674 27.85 51.031 33.360 7.725
20.901, respectively. The condition indices indicate that there are two near linear dependencies. Table 5.20 presents the eigenvectors corresponding to two small eigenvalues and the VIFs. The VIFs suggest that all variables are involved in collinearity. There is one collinearity between X5 and X6 and another between
X2 and X4. However, the analysis does not suggest the structure in Figure 5.1(a).
The eigenvectors can identify only a partial structure.
Table 5.21 gives results of the variance-decomposition proportion for the sim- ulated data. One collinearity can be seen from the last row of the table, and it suggests that X5 and X6 are involved in a collinearity. This is one of the outcomes
from the eigenvectors analysis. The row corresponding to the 5th principal com- ponent shows that the only large variance proportion is associated with X4, but
that it may be collinear with X5, because the variance proportion of X5 on that
row is heavily dominated by the last row. Again, this method does not suggest the true structure in the simulated data.
Table 5.21: Variance-decomposition proportion of simulated data Principal Condition Variance Proportion
Component Index X2 X3 X4 X5 X6 X7
5 187.269 0.389 0.253 0.905 0.000 0.081 0.002 6 436.836 0.002 0.003 0.084 0.963 0.796 0.063
Table 5.22: Cos-square transformation matrix of simulated data X2 X3 X4 X5 X6 X7 a>2 2.682 -0.405 -1.483 -0.358 -0.276 -0.214 a>3 -0.420 2.406 -1.231 -0.266 -0.245 -0.250 a>4 -1.052 -0.842 4.944 -1.147 -0.451 -0.265 a> 5 -0.240 -0.172 -1.085 6.487 -2.679 -0.717 a>6 -0.234 -0.200 -0.538 -3.377 4.623 -0.452 a>7 -0.221 -0.249 -0.386 -1.105 -0.553 2.437
It broadly suggests the same structure that was used to construct the simulated data. Results for the cos-max transformation were also produced but, for brevity, they are not presented here because, as mentioned earlier, both methods yield similar elements. The only difference is that the cos-max transformation matrix is symmetric. Hence, obviously the cos-max and the cos-square transformation give better result than the other two methods.