Conclusion - Extensions of non-negative matrix factorization and their application to the analy

Figure 3.10: Non-negative ICA left: mixed signals; center: whitened signals; right: signals rotated to non-negative orthant

3.5.2 Non-negative ICA

Simultaneously with the development of NMF, Plumbley [Plu01], [Plu02], [Plu03] worked on a non- negative version of Independent Component Analysis (ICA) [Com94]. He stated that the non- negativity of the estimated sources together with the pre-whitening of the observed data is sufficient to recover the underlying non-negative sources uniquely.

While usual ICA algorithms determine the correct rotation after a whitening step by some non Gaus- sianity measure (e.g. the kurtosis), non-negative ICA utilizes the non-negativity constraint on the sources to determine this rotation. Thus, second-order decorrelation (instead of statistical indepen- dence) together with non-negativity constraints is sufficient to solve the non-negative ICA problem. As depicted in figure 3.10, non-negative ICA first decorrelates the data and then rotates until the data fits into the non-negative orthant. In contrast, determinant constrained NMF does not need decorrelated features and directly discovers the basis vectors. Moreover, NMF is robust to additive noise (see e.g. [LCP+_08]).

3.6 Conclusion

A determinant criterion was introduced to constrain the possible solutions of an exact NMF problem. Geometrically, this criterion means a minimum volume constraint on the subspace spanned by the basis vectors and emphasizes unique best solutions for a given problem. An easy to implement algorithm called detNMF which directly incorporates the determinant criterion was used in illustrative toy examples which represent two extreme data distributions. In these extremal settings, the detNMF algorithm was contrasted with a sparse NMF variant to demonstrate that sparseness constraints can be a misleading restriction while the determinant criterion is a more general approach. Moreover, the determinant criterion provides a very concrete explanation why a cascade of consecutive decompositions usually improves the performance of any NMF algorithm.

Chapter 4

NMF application to BIN data

4.1 Data aggregation

As explained in the introductory section about wafer fabrication (1.2.1), chips are usually tested in different BIN categories.

Figure 4.1: For the present investigation, each wafer is represented by (normalized) BIN counts. The example wafer (left) contains 24 chips, each labeled by one of the BINs 0, 1, 2, 3. Here, BIN 0 means: die works fine, while BIN l (l=1,2,3) codes the event chip failed in test category l. The diagram on the right displays the same wafer in terms of BIN counts which can be further normalized by the total number of chips per wafer. If one ignores the pass BIN (BIN 0 here), the normalized BIN counts are an approximation of the fail probability in the respective BIN category.

We represent the i’th wafer by the M -dimensional row vector

Xi∗= (Xi1, . . . , XiM) ≥ 0 (4.1)

containing the number of chips carrying BIN label j, divided by the total number of chips per wafer on position Xij. If the BIN number represents a test category where the chip can fail (usually any BIN

category except the pass BIN), the matrix entry Xij can be interpreted as an approximate probability

that a chip on wafer i fails in category j.

We assume there is a set of K ∈ {2, 3, 4, . . . } underlying sources which are responsible for the fail chips. Each of the sources has an associated M -dimensional vector or typical fingerprint

Hk∗= (Hk1, . . . , HkM) ≥ 0 (4.2)

expressing the probability that a chip fails in category j due to source k in entry Hkj≥ 0.

We further assume that the fingerprint of each source Hk∗ remains the same, irrespective of the

intensity of that source. The intensity of the k’th source on wafer i is assumed to be represented by the non-negative scalar Wik ≥ 0, where Wik = 0 means that source k does not contribute to

observation i and a value Wik> 0 is a relative measure for the intensity.

In summary, we assume that an observation can be represented as linear combination

Xi∗≈ K

k=1

WikHk∗ (4.3)

of non-negative weights Wik and basis components Hk∗.

Data matrix entry Xij ≥ 0 contains the number of chips carrying BIN label j on wafer i, divided by

the total number of chips on the wafer.

As a first approximation, the linear model given by eq. (4.3) holds true.

However, there are some nonlinear effects in the wafer test data, which we neglect here. One of these nonlinearities for example is induced by the fact that each chip carries only the label of one BIN category in this example, although if the chip can potentially fail in several different BIN categories. For example, if test A is performed before B, the apparent probability that a chip on wafer i fails in test B decreases if many chips fail in test A already since we associated an approximate fail probability with the number BIN counts per total chips.

Here, we assume that such nonlinear effects are small and interpret them as noise which is absorbed in the approximately in eq. (4.3).

In general, the linear approximation is quite accurate if the assumed fail probabilities are small.

In document Extensions of non-negative matrix factorization and their application to the analysis of wafer test data (Page 47-50)