Summary of the Results - Parameter Estimation for Mixture Models Given Grouped Data

4.5 Summary of the Results

The previous sections dealt with the estimation of the parameters of the mixture models M1 - M12. Throughout these investigations, the sample size n was chosen to be 100, the interval width h was 1 and the true values were taken as starting values. The latter choice does not reflect the situation in practical applications since the true values are seldom known. However, this assumption illustrates the power of each method starting with values close to the true ones. Moreover, a variation of the initial values is investigated in section 4.7. First, four mixture models with equal underlying mixing proportions were considered (M1 - M4). This is the easiest case, since half of the observations belong to the first component, and the other half to the second. Methods containing simplifications, like APEM, RMEM and UDEM, had approved themselves for cases with highly overlapped components, where it is not obvious that the distribution consists of two subpopulations, and well separated components, where two peaks can clearly be identified. They were fast and, at the same time, they only failed in a small number of samples. Furthermore, they provided estimations that were close to the true values. Hence all of them can be recommended.

An interesting finding concerns the UDEM algorithm. This method assumes the observations to lie somewhere between the interval borders and, indeed, it seems to be irrelevant where exactly the observations are located, as long as the number of observations of each interval is specified. This effect can be observed even more clearly in the RMEM method, where only the number of observations of each interval is of interest irrespective of their exact location. This method has the simplest construction and, therefore, the easiest imple- mentation. It is fast and provides, additionally, the best fit for most of the mixture models. Therefore, this method is to be preferred.

The more complex algorithms, MIX and EMNR, which contain two and tree iterations procedures, respectively, seem to be less suitable. In particular, for highly overlapped mixture models, they provided less accurate ﬁts and with respect to the convergence properties, both methods failed in distinctively higher number of samples. However, the results of these two methods were strongly dependent on the separation of the components. The more the components were separated the better were the estimations. Furthermore, the same is valid for the runtime. The larger the distance between the component means was, the faster the results were achieved. Finally, the separation also had an inﬂuence on the convergence, since in well separated mixture distributions less convergence problems occurred. In brief, the MIX and EMNR algorithms can not be recommended for highly overlapped mixture models. However, for well separated mixtures, they provided estimates that were close to the true values. Compared to the simple methods, the MIX and EMNR algorithms are inferior. Comparing the MIX and EMNR algorithms directly leads to the conclusion that the EMNR algorithm provides estimates that are closer to the true values. The improvement of the

78 4.5. Summary of the Results

runtime of the MIX algorithm caused less accuracy in the estimates, which is a surprising result, since the aim of the combination of the EM and NR algorithms was to improve the method. In fact, an improvement of the runtime was achieved, but at the same time a deterioration of the accuracy occurred.

In contrast, considering the APEM algorithm, which is a simpliﬁcation of the EMNR and, hence, the MIX method, one can see that the estimates of the parameters obtained by this method were much closer to the true values than those obtained by the MIX or EMNR algorithm. This ﬁnding led us to the conclusion that the approximation of the probability of an individual observation to fall into an interval causes a smaller deviation than an ad- ditional iteration.

The largest advantage of the MGT algorithm is the reliability of this method in terms of the convergence properties. This method provided a solution for each considered mixture model, independent of the degree of overlapping or balance. Besides its excellent convergence properties, this method as well estimated parameters, which were close to the true values. Hence, this method can be recommended, too.

Furthermore, mixture models M5 - M8 were investigated, where the first mixing proportion is chosen to be 0.7. With such unbalanced mixing proportions, the decomposition of a mixture distribution into its components is more difficult compared to models with well balanced mixing weights. In fact, higher convergence failure rates, worse estimations and larger deviations occurred in almost all applied methods. This effect increases the more unbalanced the mixing weights were, which was shown for the mixture models M9 - M12, where π₁ = 0.9 and π₂ = 0.1. Though, it has to be kept in mind, that all these estimates resulted on sample of size n = 100. Accordingly, a second mixing proportion of 0.1 lead to 10 observations in the second component. From those the second means and standard deviations had to be calculated. Due to this fact, it is natural that the second standard deviation was underestimated by all methods, independent of the component separation. To conclude, the results show that grouped observations do not automatically cause a less capable fit. Depending on the underlying mixture model and the choice of an appro- priate method, an adequate fitting can be achieved. Nevertheless, it should be noted that these results are based on samples that lead to a solution within 500 iterations. Samples with a higher number of iterations and those where the algorithms converge to a singular- ity or divergence, are excluded. This might be the reason for the good results comparing those computed by other authors, e.g. Hosmer (1973a). He investigated the accuracy of maximum likelihood estimates in two component mixtures with highly overlapped components and small samples sizes. His simulation study reinforces the need for well separated components with Δ ≥ 3 and large sample sizes n > 300. For the other cases, he stated, that“[...] the ML estimates may not to be accurate enough to provide useful estimates”

In document Parameter Estimation for Mixture Models Given Grouped Data (Page 89-91)