CHAPTER 4. STATISTICAL METHODS FOR QUANTIFYING
4.5 Implementation
4.5.1 Methods for Assessment
zi− β(j)Tui− δk(j)
zi− β(j)Tui− δk(j)T
Pn
i=1w(j)ik , k = 1, . . . , K (4.37) The algorithm proceeds iteratively until the difference in incomplete log-likelihood between successive iterations is less than some small value (i.e. l(θ(j)) − l(θ(j−1)) < ). Each pixel is classified to the k-th component with highest posterior probability
P (Ci = k|z, ˆθ) = pˆkN2(zi; ˆβTui+ ˆδk, ˆΣk) PK
k=1pˆkN2(zi; ˆβTui+ ˆδk, ˆΣk) (4.38) Figure 4.19 compares the labeling of Li growth between the linear GMRM and the linear BGMRM after merging components, but prior to performing a spatial correction.
Comparison of the performance between the univariate and bivariate models is difficult (discussed in the Section 4.5). The bivariate model shows promise in reducing the number of pixels that are misclassified as Li growth over electrolyte and anode, but may also bias towards non-growth labeling. The labeling for a Video 2 frame in Figure 4.19 suggests the bivariate model classified less pixels to Li growth than the univariate model. Further analysis of growth curves to compare the models is discussed in Section 4.5.
4.5 Implementation
4.5.1 Methods for Assessment
We are additionally challenged in this application by the fact that no “true” segmenta-tion is available for any images. Assessment of the number of necessary components and quality of cluster assignment in unsupervised problems is very difficult and comparison of performance of the different trend models and the bivariate models we have discussed is not straightforward. Likelihood based comparison metrics such as AIC and BIC are often used in finite mixture modeling to select the number of components, and could be used to compare between trend models. Additionally, likelihood ratio tests (LRT) have been
Figure 4.19: Comparison of Li growth for the linear GMRM and linear BGMRM models for two 10%
sampled images, prior to spatial correction.
studied for finite mixture models, but regularity conditions for asymptotic results of the LRT do not always hold in Gaussian mixture models due to unidentifiability if components overlap and unboundedness of the likelihood if one or more component variances goes to 0 (Aitkin and Rubin, 1985; McLachlan and Rathnayake, 2014). Unfortunately, likelihood based criteria only assess how well the mixture models fit the observed data likelihood, and do not take into account the “goodness” of the clustering. Using BIC to select the number of components, K, will often overestimate K if the family of models considered does not contain the “true” model. This is the case in Gaussian mixture models when one or more of the component densities is skewed.
With the aim of identifying the number of components desired for clustering, several criteria have been developed which center around the complete likelihood Lc(θ) used in the EM algorithm (Biernacki et al., 2000; McLachlan and Rathnayake, 2014). Examples of these metrics are the classification likelihood criterion and integrated classification likelihood (Biernacki et al., 2000). The metrics are similar to AIC and BIC, but penalize model
complexity by an entropy function of the classification matrix defined by C. They have been shown to work well in selecting a parsimonious number of components when mixing proportions are equal, but tend to overestimate the number of clusters with no restriction on the proportions. Assessment of our methods is additionally complicated by the fact that the final cluster assignment is a combination of clusters from the finite mixture model. We also need to take into account how well clusters from the initial segmentation can be combined to a final representation of Li growth. This cannot be done with the metrics discussed above.
Lastly, these metrics do not allow comparison of clusterings between the univariate and bivariate models, as they are performed on different datasets. Statistics used to compare predictions to data, such as root mean square error, could be used to compare univariate and bivariate models but again these statistics cannot assess the goodness of the clustering.
Because of these challenges, developing quantitative methods to compare and assess model performance is an important area of future research. In this chapter, we assess performance based on visual analysis of clustered images and by comparing proportion of growth curves estimated for different models. Estimated proportion of growth for an image at time t in a video is computed as
ˆ rt= 1
n
n
X
i=1
Li (4.39)
where Li is 1 if pixel i is labeled as growth, 0 otherwise. The nature of the electron microscopy experiments suggest that during deposition we should see a steady increase in the proportion of pixels classified as Li growth, a plateau as discharge begins, and then a steady decrease as Li growth dissipates. Proportion of growth can decrease even during deposition if large Li deposits are pushed off screen (this appears to occur in Video 3), but in general the curves should be relatively concave in time. Again, we do not have “truth” to compare these growth curves too, but the curves can be compared across models to assess if particular models provide more consistent estimates of proportion of growth from image to image than others.
To assess the effect of sparse sampling rates, we compare proportion of growth curves computed on sparsely sampled images to proportion of growth curves computed on the full images. We develop a quantitative measure of relative deviance for a sparsely sampled video sequence as the average absolute deviation of proportion of growth from the 100% labeling.
We define relative mean absolute deviation (RMAD) for a sampling rate, ρ, in a given video is
RM AD = 1
|T∗| X
t∈T∗
|ˆr100,t− ˆrρ,t| (4.40)
where T∗ is the set of images considered. ˆrρ,tis the estimated proportion of pixels identified as growth as computed in Equation 4.39 for an image at time t, sampled at rate ρ.