4.2 Results
4.2.2 Number of components
The actual best number of components is a very hard question in the present example. As mentioned in the introductory NMF chapter 2, we assume the number of components K to be given, since it is required as input for usual NMF algorithms. Of course, in real world applications we do not know in advance how many underlying components there are in a data set.
Sometimes, the eigenvalue spectrum of the covariance or correlation coefficient matrix can give a hint on the dimensionality of the data space. In figure 4.5, a plot of the eigenvalues of the data covariance and correlation coefficient matrix is given.
Using normalized data such that the variance within a variable is 1, each observed variable contributes one unit of variance to the total variance. According to Kaiser’s criterion [Kai58] each principal com- ponent which explains at least as much variance as one observed variable contains relevant information. In figure (4.5, center plot), there are seven eigenvalues greater than 1, so Kaiser’s criterion suggests K = 7 to be the actual dimensionality.
In contrast, optical inspection of the eigenvalue spectrum of the data covariance matrix (4.5, left) may suggest either 3 principal components, 7,8, or 11 important components in the data (indicated by large decrease between adjacent eigenvalues).
Obviously, these rule of thumbs for the estimation of the dimensionality by judgment of the PCA eigenvalue spectrum does not lead to a clear statement on the actual number of components in the case considered here.
Anticipating the result of a later chapter, a variational Bayes procedure suggests that K = 11 com- ponents give the best explanation for this dataset (see the maximum of the variable Bq in figure
4.5, right) We will explain the theory behind the computation of this Bayesian criterion in detail in paragraph (8.2.2).
4.3
Summary
This section demonstrated the performance of the NMF technique on wafer test data which was aggre- gated in relative BIN counts per wafer. Each column of the data X corresponds to one BIN category, while each row constitutes the data of one wafer. Data entry Xij is an approximate probability that
a chip on wafer i fails in BIN category j. We assumed a linear non-negative superposition of K individual failure causes, ignoring all nonlinear effects which can be contained in such wafer test data. The NMF methodology was proven to extract consistent components of a data set comprising N = 2800 wafers and M = 19 BIN categories. For different numbers of components K, rather the same
4.3. SUMMARY 49 components are extracted, which was verified by visual inspection by the estimated basis components Hk∗ and scatterplots of the weights W∗k.
The benefit of the NMF methodology to the overall failure analysis is to offer an alternative data representation, separated by potential causes and their individual contributions.
We do not discuss further analysis steps which are necessary to detect the actual root causes in the processing line. As sketched in section (1.2) other data analysis tools can be used to match the NMF findings with other historical data of the investigate wafers.
The determination of the actual number of hidden components remains an open issue. A sneak preview on a later chapter was given which addresses the problem of model order selection.
In this chapter, test data was analyzed on wafer level, i.e. fail probabilities in different BIN categories were approximated by the fraction of chips per wafer carrying the respective BIN labels, and a linear non-negative superposition model was assumed. In that case, usual NMF techniques can be applied. Due to the developments in chapter 3, the uniqueness ambiguity for a fixed number of components does not occur since we used a volume-constrained NMF algorithm.
While this chapter concerned the direct application of NMF to suitably aggregated data on wafer level, the next chapter will follow a different approach. In chapter 5, we will construct a non-negative superposition model for binary test data on chip level and develop a new extension of NMF suited to this problem.
Chapter 5
NMF extension for Binary Test
Data
In this chapter a new approach to the analysis of wafermaps is introduced. In contrast to the preceding chapter (4), where test data on wafer level was approximated by a standard NMF model, a new method for binary data on chip level is developed here. The method is called binNMF and is aimed to decompose binary wafer test data into elementary root causes and their individual contributions. The data is assumed to be generated by a superposition of several simultaneously acting sources or elementary causes which are not observable directly. Throughout this chapter we will use the term superposition as a synonym for the action of several underlying sources and their joint effect on the outcome of the wafer test procedure.
Based on a minimum of assumptions the superposition process is modeled and its reversion allows to identify the underlying source characteristics.
5.1
Binary Data Sets
Binary test data arise for example in the final functionality tests of microchip fabrication. Irrespective of which physical quantity is actually measured, the information transmitted for every single die is a binary pass or fail variable. Hence already after the first failed single test the corresponding chip is labeled fail while it is labeled pass only if it passes all single tests. Such large binary data sets, which are collected in many practical applications apart from semiconductor industry, may be considered to be generated by diverse hidden underlying processes. Here, a multiple cause model is assumed for data generation, and the problem is related to NMF methodology. Not surprisingly, parallels exist between the proposed method and existing techniques from various disciplines such as statistics, machine learning, neural networks and bioinformatics which have strong overlap and use basically similar procedures for different target settings.