Inference about Li and the Effect of the Choice of K

CHAPTER 5. BAYESIAN LINEARLY CONSTRAINED GAUSSIAN

5.4 Illustration on Simulated Data

5.4.3 Inference about Li and the Effect of the Choice of K

In this section, we simulate from model (5.23) to explore the effect of over-fitting the Bayesian LC-GMRM on the ability of the methodology to correctly identify Li and the

additional computational costs of fitting a higher-order model than necessary. We simulate six images from model (5.23) for model specifications given in Table 5.4, where the trend coefficients are simulated with β = (12, 3) for all simulations. Table 5.4 also includes the simulated proportion of Li growth (r_t), where an Li labeling is determined for each simulation using the methods of Section 5.4.1. We refer to the simulations as Simulation 1 - 6, respectively. The combinations of parameter values for each simulation are chosen to illustrate increasing amounts of Li within an image, K increasing with Li, and various degrees of overlapping/redundancy between mixture components. Figure 5.6 shows de-trended (z_i − u_iβ|C_i = k ∼ N (δ_k, σ²_k)) density plots (top) and the profile of simulated images with grayscale value (z_i) plotted against x_i (bottom) and pixels colored by “true”

Li labeling (blue). Simulation 1 illustrates an image where no Li growth is present (i.e.

emulating the experiment before any charging of the anode has occurred) with K = 2 components, while Simulations 2 - 5 utilize K = 3 components to simulate increasing Li deposition under the third component. Lastly, Simulation 6 illustrates a situation where a significant amount of Li is present, and the distribution of Li grayscale values is given by a mixture of two components which are significantly overlapping.

Table 5.4: True parameter values for six simulated images from model (5.23). “True” proportions of growth (r_t) obtained using simulated Li labels, are also recorded.

r_t p δ σ

Sim 1 0.000 (0.3, 0.7) (40, 120) (9, 10)

Sim 2 0.036 (0.3, 0.65, 0.05) (40, 120, 150) (9, 10, 12) Sim 3 0.110 (0.3, 0.55, 0.15) (40, 120, 150) (9, 10, 15) Sim 4 0.149 (0.3, 0.5, 0.2) (40, 120, 150) (9, 10, 18) Sim 5 0.314 (0.3, 0.35, 0.35) (40, 120, 165) (9, 10, 25) Sim 6 0.375 (0.3, 0.3, 0.08, 0.32) (40, 120, 150, 185) (9, 10, 10, 30)

We first compare the proportion of Li growth estimated utilizing a K = 5 over-fitted model with a sparse Dirichlet(1/10) prior, to the model with K set to the true number of components (K₀) and a Dirichlet(2) prior. The Dirichlet(2) prior is specified for the K₀ model so that no shrinkage is imposed on the component weights. For each simulation

Figure 5.6: Plots of de-trended mixture densities zi− uiβ|Ci = k ∼ N (δk, σ_k²) (top) and grayscale values plotted again the x-coordinate (bottom) for Simulations 1-6, simulated at parameters values in Table 5.4.

Simulated Li labelings are shown in blue in the grayscale by x plots (bottom), with the amount of Li increasing for each simulated image.

and both model specifications, we simulate two chains of 10,000 iterations using the Gibbs sampler in Section 5.2.2, with 5000 iterations discarded for burnin. The chains are thinned to every 5th iteration to obtain 2000 Li labelings using the algorithm in Section 5.3.4.

The estimated proportions of Li growth for each simulated image are given in Table 5.5 for both the over-fitted, K = 5 models and “true” K₀ models. For all simulations, the proportion of Li growth is reasonably estimated, with the true proportion falling within the 95% uncertainty intervals for both specifications of K. Mislabeling rates are given in Table 5.6, and show less than 1% of pixels are mislabeled in each simulation.

Table 5.5: Estimated proportions of Li growth with 95% empirical quantile intervals from the over-fitted model (K = 5), compared to model estimated with the true number of components, for Simulations 1-6.

K = 5 K = K₀

r_t rˆ_t (95% CI) rˆ_t (95% CI) Sim 1 0.000 0.000 (0.000, 0.000) 0.000 (0.000, 0.000) Sim 2 0.036 0.040 (0.026, 0.074) 0.046 (0.029, 0.083) Sim 3 0.110 0.117 (0.097, 0.151) 0.123 (0.100, 0.166) Sim 4 0.149 0.152 (0.119, 0.203) 0.155 (0.119, 0.206) Sim 5 0.314 0.307 (0.291, 0.328) 0.307 (0.291, 0.328) Sim 6 0.375 0.379 (0.346, 0.407) 0.373 (0.337, 0.403)

Table 5.6: Mislabeling rates of Li for the over-fitted model with K = 5, and the model with K = K₀, for Simulations 1 - 6. Rates are computed as the proportion of non-matching Li between simulated truth and final Li labelings for each model.

Sim 1 Sim 2 Sim 3 Sim 4 Sim 5 Sim 6 K = K₀ 0.000 0.004 0.010 0.001 0.008 0.003 K = 5 0.000 0.000 0.004 0.002 0.008 0.005

Under the over-fitted model, the sparse Dirichet(1/10) prior results in unnecessary com-ponents that tend to empty in the posterior. Figure 5.7shows trace plots of 5000 iterations (after burnin) of p, δ, and σ for Simulation 2. Components k = 1, 4, 5 center around the true parameter values, while the k = 2, 3 components tend to empty. Large draws of δ₂ and δ₃ occur when a draw of the component labelings allocates no observations to that component. However, each component has non-zero probability of being allocated observa-tions at each iteration, even if p_k is very small. This is visible in the traceplot of δ_k where small jumps are made when the component is allocated a small number of observations.

Therefore, if order estimation is desirable a cutoff must be imposed to determine if a com-ponent has non-negligible weight (e.g. see (Gelman et al., 2014)). Order estimation is not particularly of interest for this application, as the components are combined to obtain Li labelings for inference, but for the purpose of discussing the benefits of the Dirichlet prior, we estimate the order of the LC - GMRM for Simulations 1 - 6 here. K₀ is estimated under the over-fitted model from j = 1, . . . , M = 2000 MCMC samples as

where is a small valued cutoff chosen to determine whether a component is important. For this simulation, we set = 0.001 as this suggests a component weight is non-negligible at iteration j if it is allocated more than one observation. For each simulation, we found the order estimated as the true value of K, with the exception of Simulation 6, where ˆK₀ = 3 components were estimated. This is not particularly surprising as the 4th component of the mixture under Simulation 6, has small weight (p4 = 0.08) and overlaps significantly

with other components (Figure5.6). This does illustrate a potential downside of the sparse Dirichlet prior however, as small probability components may be pushed towards emptying.

We found that this tends to occur only if the component is not well separated from other mixture components, and the proportion of growth for Simulation 6 under the over-fitted model is still reasonably well estimated.

Figure 5.7: Trace plots of 5000 samples of p, δ, and σ², after burn-in, simulated from the posterior distri-bution of model (5.23).

Additionally, we illustrate the importance of the sparse Dirichlet prior for over-fitted model by comparing the K = 5 model to one estimated under a Dirichlet(2) prior for Simulations 1 and 4. Here, α = 2 > d/2 = 1 and thus the components tend, asymptotically, to be non-empty in the posterior. Figure 5.8 shows trace plot of p for 10000 iterations (including burnin) under the Dirichlet(1/10) and Dirichlet(2) models for Simulations 1 and 4. Under the α = 2 prior, components have not emptied, which introduces additional correlation between component parameters seen in Figure 5.8 (left). Under the α = 0.1 prior, the redundant components have mixture weights tending to 0, and mixing in the iterations is improved while still exhibiting jumping between minor modes in traceplots for Simulation 4 (bottom right).

Figure 5.8: Trace plots of p sampled from model (5.23) for Simulation 1 (top) and Simulation 4 (bottom) under a sparse Dirichlet prior, α = 1/2K = 0.1 (left) and under a “merging” prior with α = 2 (right). The horizontal lines give the true values of non-zero p for both simulations.

In the above, we have shown that the over-fitted K = 5, LC-GMRM performs com-parably to models fit with the true number of components, suggesting that the over-fitted model can be utilized to reduce the importance of choosing K for analyzing electrochemical (S)TEM images. However, an important aspect of such analyses is computation time, as it is often desirable to analyze images in near-real time. If computation time increases signif-icantly as components are added to the model, an over-fitted model may not necessarily be more computationally efficient than choosing K based on model comparison. Fortunately, we found that computation time is most affected by the number of observations considered in an image n, and increases only marginally as the number of components is increased.

The Gibbs sampler in Section 5.2.2 is coded in C++ and interfaced with R using the Rcpp package (Eddelbuettel and Fran¸cois, 2011). We compare computation time to obtain

MCMC samples from the joint posterior for images simulated with parameters specified in Simulation 1 for n ∈ {200, 1000, 5000, 50000} and model order K ∈ {2, 3, 5}. Average computation times (in seconds) to obtain 1000 samples on a 2.6 Ghz Intel Core i5 processor are given in Table5.7. Computation time increases significantly as the number of considered observations increases, but it increases only marginally as components are added to the model. This suggests that the over-fitted model does not dramatically increase computation time over a K₀ model.

Table 5.7: Average real-time computation (on 2014 MacBook Pro with 2.6 GHz Intel Core i5 processor) to simulate 1000 iterations from model (5.23) for the number of observations, n ∈ {200, 1000, 5000, 50000} and the number of components, K ∈ {2, 3, 5}. Observations are simulated from a K = 2 component model with parameters as in Simulation 1.

n = 200 n = 1000 n = 5000 n = 50000

K = 2 0.33 1.63 7.19 74.99

K = 3 0.35 1.78 7.75 81.24

K = 5 0.38 1.87 8.75 92.79

These simulations illustrate how the over-fitted Bayesian LC-GMRM can be utilized to reduce the importance of the choice of K on inference of (S)TEM images, without dramat-ically increasing computation time. The true number of components necessary to provide a reasonable segmentation of an image is rarely known apriori, making the properties of the over-fitted model particularly desirable. If K is set to a large enough value, such that it can be reasonably assumed no image in a sequence requires more than K components, a universal K can be used to analyze all images without apparent adverse affect under a sparse Dirichlet prior.

In document Methods for analysis and uncertainty quantification for processes recorded through sequences of images (Page 174-180)