Chapter 4 Applications
4.1.3 Results
The main parameters of interest here are the common weightε and the concentration parameters ofF0, F1 andF2. It is also interesting to look at the predictive distribution in each ofF1∗ andF2∗, as well as the predictives of the component distributionsF0, F1 andF2. Secondary parameters of interest could bem, BandS,as well as the clusters sizesK0, K1andK2.The graphs of the posterior distributions of the parameters of interest were created using kernel density estimators with normal kernels and a choice of bandwidth based on trial and error.
Let us first consider the model of M¨uller et al. (2004), with and without the mix-split step. The first thing to notice is the effect of the extra mix-split step. As can be seen from the trace plots in Figure 4.1, mixing for the parameterεis better when this extra step is applied, as one would expect. Although the improvement in mixing is not huge here, in the following I will report the results with this extra step. Looking at the kernel density estimate of the posterior distribution of the weightε
(Figure 4.2), we see a large mode around 0.7 and two smaller ones around 0 and 0.1 (the sizes of the smaller weights is actually slightly different without the mix-split step). The predictive densities ofF∗
0 5 10 15 x 104 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 2 3 4 5 x 104 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 4.1: Trace plots forεin Model (1.2.13), with (right) and without (left) the mix-split
step.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6Figure 4.2: Posterior distribution ofε, Model (1.2.13).
next to them in Figure 4.3. Notice that, although one data point from the second data set (Exxon Mobil Corp) takes the value 7.88, in the predictive density ofF∗
2 we do not get a significant mode around that point (not shown in the graph). This is because only one observation is allocated to that (potential) cluster, and because the specific value is very far from the posterior base distribution. The lack of mode at that point allows us to focus on the values of the predictives in the interval (-5,7), for a better visual inspection of the results. The same reasoning applies when displaying the predictive densities ofF0, F1, F2, as well as in the other models.
The predictive densities of the component distributions are shown in Figure 4.4. For F0, this predictive density is slightly positively skewed with one large mode around 0 and heavier than normal tails. F2, on the other hand, seems fairly symmetric, with the same mode around 0. The
−6 −4 −2 0 2 4 6 8 0 10 20 30 40 50
First data set, Y1
−6 −4 −2 0 2 4 6 8 0 10 20 30 40 50
Second data set, Y
2 −4 −2 0 2 4 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Predictive density for F 1 * −4 −2 0 2 4 6 0 0.2 0.4 0.6 0.8
Predictive density for F2*
Figure 4.3: Histograms of the data (left) and predictive densities for
F
∗1
and
F
2∗(right).
predictive density forF1 exhibits some different features. The mode at 0 exists, but now we have two smaller modes, one around -3 and one around 3.5. The predictive density forF1 also exhibits positive skewness. The differences in the predictive densities ofF1andF2 highlight the differences in the marginal distributions between the two stocks, which seem to be more extreme in the intervals (−4,−2) and (2,5).
Next, consider the results for the posterior distributions of the precision parametersM0, M1andM2 (Figure 4.5). It is seen thatM1 takes significantly larger values than M0 andM2. This difference betweenM2andM1simply means that (a posteriori)F1is closer to the (common) base distribution
H thanF2, and can also be seen as another aspect in which the two idiosyncratic partsF1 andF2 differ. Also notice that the posterior distributions of all three concentration parameters look very similar to a gamma distribution, indicating a prior-posterior accordance for these parameters. For comparison purposes, the prior distributions for theM’s were also plotted in the same axes. By looking at Figure 4.6, it is clear that there are many more clusters inF1 than inF0 orF2. This result is in accordance with the fact thatM1 is larger than the other two (see, e.g. equation 10 in Escobar and West (1995)).
Finally, the posterior mean, median and and 95% credible interval for the posterior distributions of
m, SandB are shown in Table 4.1.
Next, we investigate the use of my proposed model, Model (2.1.4), when applied to the same data set, again with and without the extra mix-split step in the MCMC algorithm.
In this case, the improvement of mixing caused by the extra mix-split step is more obvious, although the percentage of accepted split or merge steps was almost the same as before (10.5%). As seen in Figure 4.7, there is a substantial improvement in the mixing ofεwhen the split/merge step is used
−6 −4 −2 0 2 4 6 8 0 0.01 0.02 −6 −4 −2 0 2 4 6 8 0 0.02 0.04 0.06 −6 −4 −2 0 2 4 6 8 0 0.02 0.04
Figure 4.4: Predictive densities forF
1(top),F
2(middle) andF
0(bottom) for Model (1.2.13).
in the algorithm. This improvement is more evident than in the case of the previous model, and I will therefore report the results when this step was included in the code. The posterior ofεis seen in the next graph (Figure 4.8) and it has a big mode around 0.7, and a smaller one around 0.1. The predictive densities forF∗
1 andF2∗ were very similar to the ones in the case of Model (1.2.14). This is a sensible result, since we have 1000 data from each data set, therefore dominating the predictive distributions.
The predictives for the component distributions F0, F1 andF2 (Figure 4.9) were also very similar to the previous ones, especially the predictive densities forF0andF1. The predictive density forF0 is also unimodal at 0 and slightly positively skewed. ForF1 there are three modes, a large one around -1 and two smaller around -3 and 3.5. The predictive density forF2 is again unimodal around 0, but now with a little negative skewness, unlike the corresponding distribution for the model of M¨uller et al. (2004).
0 0.5 1 1.5 2 2.5 3 0 1 2 3 M 0 Density
Prior and posterior density for M0
0 2 4 6 8 10 12 14 16 18 0 0.2 0.4 0.6 0.8 M 1 Density
Prior and posterior density for M1
0 0.5 1 1.5 2 2.5 3 0 1 2 3 M2 Density
Prior and posterior density for M2
Posterior density Prior density Posterior density Prior density Posterior density Prior density
Figure 4.5: Prior (dashed line) and posterior (solid lines) distributions of
M
0, M
1and
M
2in Model (1.2.13).
without the extra step the results were similar. However, mixing was not as good as in the improved algorithm (i.e. like in the case ofε). These posteriors look different from the ones for the precision parametersM0, M1, M2used in the model of M¨uller et al. (2004). An apparent reason for that is that there are only two such parameters, rather than three. Another difference is the prior distribution of the weightε, which involves the precision parameters only in model (2.1.4). On the other hand, the posterior distributions are gamma-like, indicating a prior-posterior accordance. It is also worth mentioning thatM0 takes smaller values thanM1.
In this model, it is also interesting to look at the posterior distributions of the reparametrisation
x=M0+M1, y = M0M0+M1. y can be seen as the prior mean of ε, as well as the prior correlation between F∗
1(A) andF2∗(A), ∀ A ∈ F. On the other hand, x can be thought of as a precision parameters of the prior distributions of F∗
1(A) and F2∗(A). As seen in Figure 4.10 (right), the posterior ofx is like a gamma distribution, with mean around 2.50, andy is a negatively skewed distribution, with mean around 0.20. In the same axes the prior distribution of all these quantities were also plotted, for prior-posterior comparison purposes.
S m B
Mean 0.237 0.749 14.26 Median 0.236 0.740 13.38 2.5th percentile 0.189 -0.828 7.78 97.5th percentile 0.284 2.38 25.79
Table 4.1: Posterior mean, medians and 95% credible intervals for
S, m
and
B
in Model
(1.2.13).
0 2 4 6 8 10 12 14 16 0 0.1 0.2 0.3 0.4Posterior distribution of K
2 0 10 20 30 40 50 60 70 80 0 0.02 0.04 0.06Posterior distribution of K
1 0 5 10 15 20 25 0 0.05 0.1 0.15Posterior distribution of K
0Figure 4.6: Posterior distributions of
K
0, K
1and
K
2for the M¨uller et al. (2004) model.
each component distribution,K0, K1 andK2.
Finally, the posterior distributions ofS, mandB for this model were very similar to the case of Model (1.2.13).
I conclude by summarizing the findings and mentioning the differences when the two models (the model of M¨uller et al. (2004) and model (2.1.4)) were fitted to the same data. The comments will be about the results with the mix-split step applied to the MCMC algorithms, since it has been shown that this extra step results in better mixing of the chain.
The predictive densities for the correlated distributions F∗
1 andF2∗ were the same, as they are dominated by the large data size. This is more or less true for the predictive densities of the component distributions F0 andF2. On the contrary, the predictive density for F1 was not the
0 5 10 15 x 104 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ε
Trace plot for ε
0 5 10 15 x 104 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 4.7: Trace plots for
εin Model (2.1.4) with (right) and without (left) the mix-split
step.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8Figure 4.8: Posterior distribution ofεin Model (2.1.4).
same. However, the differences were mostly a matter of the sizes of the clusters for each component distribution.
Regarding the posterior distribution of the weightε,the posterior modes are more or less the same. There are slight differences in the relative sizes of those modes. When the extra mix-split step was used, mixing ofεwas good for both models.
There were some differences in the posterior distributions of the concentration parameters, as well. As mentioned before, direct comparison of those quantities is not straightforward, since in one model there are three of those parameters, whereas in the other one, only two. Apart from that, one can see thatM0 has more or less the same posterior in both models, whereasM1 is much larger in the model of M¨uller et al. (2004) than in Model (2.1.4). A possible reason for this is that in the former case there are two concentration parameters for the idiosyncratic parts,M1 andM2,whereas in the
−6 −4 −2 0 2 4 6 8 0 0.01 0.02 −6 −4 −2 0 2 4 6 8 0 0.02 0.04 0.06 −6 −4 −2 0 2 4 6 8 0 0.02 0.04
Figure 4.9: Predictive densities forF
1(top),F
2(middle) andF
0(bottom) for Model (2.1.4).
latter case, there is onlyM1. So, in one sense,M1 in Model (2.1.4) accounts for bothM1 andM2 of Model (1.2.13), and sinceM2 is much smaller than M1 there, this causes the “common”M1 in Model (2.1.4) to take smaller values thanM1in the other model.
These differences are consistent with the differences in the number of clusters in the component distributions,K0, K1 andK2.We see that K0 is slightly larger in Model (2.1.4),K1is much larger for Model (1.2.13) andK2 is significantly larger for (2.1.4). In both cases K1 is much larger than
K0 andK2, K0 is larger thanK2 for Model (1.2.13), whereas the opposite holds for model (2.1.4). The differences in the posterior medians of the precision parameters and of the number of clusters, as well as the correspondence of the magnitude of those two sets of parameters, can also be seen in Table 4.2.
Model M0 M1 M2 K0 K1 K2
(1.2.13) 0.49 6.46 0.19 5 28 2
(2.1.4) 0.41 1.86 - 3 17 6