Non-axis Parallel GMM Approximation - Searching Uncertain Data Using GMMs

4.4 Searching Uncertain Data Using GMMs

4.4.3 Non-axis Parallel GMM Approximation

To accelerate the computation of our similarity measure for achieving the most similar objects to a query object we propose an approximation technique. This is required since our similarity measure is very expensive due to the consideration of the entire covariance matrix information including correlations between different features. In this approximation technique each weighted non-axis parallel Gaussian component _Gi of the nGMM with the

three parameters wi, µi, and Σi is approximated. The goal is to replace

each non-axis parallel Gaussian component with an axis parallel Gaussian component having the parameters ψi, µi, and χi = φiDi. Thereby, ψi and φi are scalar values and Di is a diagonal matrix leading to an axis parallel matrix and hence an axis parallel Gaussian representation. ψi represents the

new weighting factor and φi in combination with Di corresponds to the new

axis parallel covariance matrix χi. To put it in other words, the conservative

approximation of a non-axis parallel Gaussian curve leads to an axis parallel Gaussian curve.

The aim is to conservatively enclose the original non-axis parallel Gaus- sian _Gi in the new axis parallel Gaussian which can be achieved by a specific

to determine the axis parallel representation of a non-axis parallel Gaussian. We start up with the non-axis parallel Gaussian curve which can be described by the three original model components wi, µi, and Σi

wi·N(x;µi,Σi) = wi p (2π)d_|_Σ i| exp −(x−µi) T_Σ−1 i (x−µi) 2 (4.6)

The notion in the exponent not considering the ₋1₂ is the Mahalonobis Distance (M D) or generalized squared interpoint distance [Mah36, DMJRM00] between x and µi of the same distribution with the covariance matrix Σi

M D(x;µi,Σi)2 = (x−µi)TΣ−i 1(x−µi). (4.7)

The Mahalonobis distance uses correlations in the data to specify the similarity of an unknown and a known variable because the M D is obtained by inverting the covariance matrix.

Note, that we intent to replace the covariance matrix Σiby an axis parallel

matrix χi. Hence, we have to look for a distance functionM D(x;µi, χi)2 for

the axis parallel matrix χi satisfying the following characteristic

M D(x;µi, χi)2 ≤M D(x;µi,Σi)2, ∀x, µi ∈Rd. (4.8)

Since the exponential function is a monotonous function, in our case it is a steeply and monotonously downward-sloping curve on account of the ₋1₂ in the exponent, we have to find a real lower bound of M D(x;µi, χi)2 to

guarantee that M D(x;µi, χi)2 is smaller or equal in contrast to the original

Mahalonobis distance.

The covariance matrix Σi consists of the covariances on the off diagonal

and the squared variances σ2_i₁, . . . , σ_id2 on the diagonal. In order to obtain an axis parallel matrix representation χi we need a diagonal matrix. Therefore, we fillDi, being part of the axis parallel matrixχi, with the squared variances

original Gaussian ellipsoid

axis parallel Gaussian

ellipsoid with Di-1 axis parallel Gaussian ellipsoid with χi = ϕiDi-1

a) b)

Substitute z = Di-1/2(x-μi) => Transform data space

c) Data space Data space Transformed space (x-μi)T (ϕiDi)-1 (x-μi) = 1 (x-μi)T Σi-1 (x-μi) = 1 (x-μi)T Di-1 (x-μi) = 1 zT_ϕ i-1z = 1 zT_D i1/2Σi-1Di1/2z = 1 ϕi1/2

Figure 4.3: Illustration of the approximation using φi and D−i 1. a) Without

the scaling factor φi the evolving axis parallel ellipsoid (red ellipsoid) is not

a conservative approximation of the original Gaussian ellipsoid (black ellipsoid). b) Including the scaling factor in the Mahalonobis distance leads to the desired conservative approximation (blue ellipsoid). c) In order to achieve the scalar valueφiwe transform both ellipsoids by substitutingz =D_i−1/2(x₋µi). of the original covariance matrix Σi

Di = diag(σi21, . . . , σ 2

id). (4.9)

If we would now simply insert the diagonal matrixDi in the Mahalonobis

distanceM D(x;µ, D)2 we would not receive a lower bound of M D(x;µ,Σ)2 as shown in Figure 4.3 a. Hence, we need to multiply the matrix Di with a scalar value φi also called the scaling factor to obtain the final axis parallel

matrix χi illustrated in Figure 4.3 b.

In order to determine this scaling factor, we use a transformation which leads to a spherical representation of the axis parallel ellipsoid. An ellipse can

be described by the equation of center x2/a2+y2/b2 = 1 witha and b being positive real numbers. In case of a spherea=band since the denominator of the spherical equation of center is the squared radius of the sphere, the radius can be easily determined by taking the squared root of the denominator.

Now, to determine a spherical representation of the ellipsoid we convert the data space. In other words, we multiply the original data (x₋µi) with

the inverted square root of the diagonal matrix Di leading to a spherical

view of the axis parallel ellipsoid depicted in Figure 4.3 c. To transform the ellipsoid we first have to obtain pD−_i 1. The inverse and the square root of

Di can be calculated for each elementσ12i, . . . , σ2id separately, due to the fact

that the matrix Di is a diagonal matrix:

q D−_i 1 = diag _p1 σ2 i1 , . . . ,_p1 σ2 id ! . (4.10)

Now, to determine the transformation we substitute z =pD_i−1(x₋µi),

which can be reformulated as (x₋µi) = z √

Di. Hence, the axis-parallel

ellipsoid

(x₋µi)Tχ−_i 1(x₋µi) = 1 (4.11) becomes with χi =φiDi

(zpDi)T(φiDi)−1(pDiz) = z

T_z φi

= 1 (4.12)

while the non axis-parallel ellipsoid becomes (x₋µi)TΣi−1(x−µi) =zT

DiΣ−i1

Diz = 1. (4.13)

Since, the radius of the sphere is the square root of the denominator, it is √φi. This radius corresponds to the largest semiaxis of the inner ellipsoid

which can be determined by taking the inverse of the smallest Eigenvalue of the inverse correlation matrix C_i−1.

The correlation matrix Ci with the elements corrr,s (1 ≤ r ≤ d, 1 ≤ s _≤ d) of the Gaussian _Gi can be obtained by dividing each element covr,s

of the covariance matrix by the corresponding square root of the variance of covr,r = σr2 and covs,s = σs2, hence, corrr,s = √_covcov_r,rr,s_cov_s,s = √covr,s

σ2 rσ2s

. Since the matrix Di contains the diagonal elements of the covariance matrix Di =

diag(σ_i2₁, . . . , σ2_id) we can also obtain the correlation matrix by

Ci =

D_i−1Σi

D−_i 1. (4.14)

Looking at equation 4.13 we have the inverse of the correlation matrix

C_i−1 =pDiΣ−i 1

Di. (4.15)

The aim is to determine the largest semi-axis of the inner ellipsoid which corresponds to the transformed ellipsoid of the original non-axis parallel Gaussian distribution. To determine the largest semi-axis of the inner ellipsoid we have to calculate the inverse of the smallest Eigenvalue of the inverse correlation matrix C_i−1. Thereby, the Eigenvalues Λ0 as well as the Eigenvectors V0 can be obtained by the Eigenvalue decomposition ofC_i−1.

C_i−1 =pDiΣ−i 1

Di =V0Λ0V0T (4.16)

Note, that the Eigenvalue matrix is a diagonal matrix Λ0 = diag(λ0₁, . . . , λ0_d), therefore, the inverse of the smallest Eigenvalue can be simply obtained by

φi = (min(diag(λ01, . . . , λ

d)))

−1

. (4.17)

Using the correlation matrix Ci instead of the inverse of the correlation

matrix C_i−1 we can equivalently determine φi by using Eigenvalue decompo-

sition of the correlation matrix Ci Ci =

D−_i 1Σi

but instead of taking the inverse of the smallest Eigenvalue we now have to take the largest Eigenvalue of the set of Eigenvalues Λ = diag(λ1, . . . , λd)

leading to

φi = max(diag(λ1, . . . , λd)) (4.19)

and subsequently

χi =φiDi = max(diag(λ1, . . . , λd))·Di. (4.20)

Now that we have obtained the new axis parallel matrix χi we only have

to determine the new weighting factor in order to receive a complete Gaussian component with the three parametersψi, µi, and χi. For identifying the new

scalar value ψi we have to restructure the original Gaussian component as

follows: wi·N(x;µi,Σi) = wi p (2π)d_|_Σ i| exp −(x−µi) T_Σ−1 i (x−µi) 2 (4.21) ≤ p wi (2π)d_|_Σ i| exp −(x−µi) T_χ−1 i (x−µi) 2 = wi p |χi_| p |Σ_i| 1 p (2π)d_|_χ_i|exp −(x−µi) T_Σ−1 i (x−µi) 2 =wi s |χi| |Σ_i|N(x;µi, χi)

Taken together, the new model component is composed of the weighting factor ψi which can be written as

ψi = approxW(wi, χi,Σi) = wi

|χi_| |Σi|

and the new matrix

χi =φiDi = approxSigma(Σi) = φi·diag(σi21, . . . , σ 2

id). (4.23)

We have to mention that the new weighting factors of the newly emerged axis parallel nGMM approximation do not sum up to unity anymore. But it is not necessary that the new axis parallel conservative approximation of the original PDF is a PDF for itself since it is only an approximation which is used as an upper bound to save absolute probability calculations.

In document Haegler, Katrin (2011): Similarity Search in Medical Data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 112-118)