Mathematical description of the EKLT

A Morphoacoustic Database

4.2 Shape Parameterisation

4.2.1 Theory and Techniques

4.2.1.2 Mathematical description of the EKLT

The application of the KLT to exploratory data analysis is commonly referred to as principal component analysis (PCA). PC A is an orthogonal

linear transformation which describes a dataset with a new set of optimised bases. Each basis for the dataset is obtained in turn to account for the maximum possible residual variance present after having computed and sub

tracted the previous bases, with the constraint that it is orthogonal to each of them. The theoretical and practical aspects of PCA are discussed at length in the multivariate statistics literature (Jolliffe, 2002; Johnson and Wichern, 2002; Everitt and Dunn, 2001, for example). The principal com

ponents of a set of n observations of m variables, all within R, described by a matrix X (of dimensions m x n) can be obtained by a singular value decomposition (SVD), expressing X as

X = W SV T (4.1)

where ~VT the transpose of V, a unitary matrix over R of dimensions n x n . V contains a set of orthonormal “input” basis vectors for X. £ contains the singular values which scale the normalised input vectors contained in V.

W is a unitary matrix over R of dimensions m x m and contains a set of

“output” basis vectors for X, the principal components. The original data matrix X is then rotated (with the matrix operator W r ), giving the matrix Y, as follows:

Y = W TX (4.2)

Y is a matrix of column vectors, where each vector is the projection of the corresponding original data vector from X onto the basis vectors (principal components) contained in the columns of W. The elements of the

column vectors contained in Y are referred to as weights for the principal components contained in W. The number of principal components required to reconstruct the original observations depends on the redundancy in the original data and the target accuracy of the reconstruction.

Prior to applying the EKLT, the original data consisting of S slices each described by a sequence of T xy-coordinates is conditioned to improve PCA performance. The highly (though clearly not completely) symmetrical na

ture of the human head results in a high degree of correlation between the left and right halves of each slice. The original slices are therefore split into left and right half-slice observations and right half-slices are flipped to be aligned with left half-slice observations (see step 2 in Figure 4.5). The point order of the right half-slices has to be reversed for the similarity between the right and left half-slices to be exploited (the start points are indicated by black squares on this and subsequent diagrams). This process halves the number of variables for each observation and doubles the number of obser

vations. Another performance improvement can be achieved by exploiting the similarity between the top and bottom quarter slices, treating each as a separate observation (step 3 in Figure 4.5). The bottom quarter slices are flipped to be aligned with the top ones and, again, the point order is reversed so that the starting point for all slices is the interaural axis (see Figure 4.7).

As all the observations must be described by the same number of variables for a PCA analysis to be applied and noting that the length of contour portions OA and OB (see Figure 4.7) will generally be slightly different, the sampling interval between 0 and A is adjusted so that T/ 4 samples lie between them (as a full slice contains T samples). The same process is

Figure 4.5: Slice conditioning for first EKLT stage

applied to obtain T/A equally spaced samples along the contour portions OB, PA and PB. This step presents the significant advantage of improving the alignment of pinna features from one slice to the next and across subjects.

The re-sampling ensures that the interaural axis occurs at the same sample number on each slice.

0 rad intersection

plane rotation ^tt rad intersection plane rotation

Figure 4.6: Slices at 0 and 7r rad rotation are different though they represent the same data

To illustrate another benefit of the data conditioning process, Figure 4.6 shows (dummy) slices for an intersection plane at 0 rad and n rad rotation.

Although they describe exactly the same data, these are very different as full slice PCA observations. However, once slices are split into quarter slice observations, which are then aligned, the similarity between slices around 0 rad and tt rad is exploited and further energy is concentrated into lower order PCA components.

For the first stage of the EKLT, each observation in a data matrix Xi is assembled by concatenating the x and y co-ordinates for a given quarter slice (see Figure 4.7). Each quarter slice contains T/4 points, each with x and y coordinates giving a total of 2(T/4) = T / 2 variables per quarter slice observation. If there are N subjects and S full slices per subject, there are

y

Figure 4.7: The four quarter slices resulting from each original slice are aligned so as to exploit their similarity during PCA (the black square shows the starting point in each case). For each quarter slice comprised of T/4 points, the x and y coordinates are concatenated to form a first stage EKLT observation containing T/ 2 variables.

4SN quarter slice observations. Xi therefore has the dimensions T/2 x 4SN.

The PCA of Xi gives the weights matrix Yi according to

(4.3)

where the columns of W i contain the principal components of Xi. Yi contains the results from the first EKLT stage. As for the EFT, the second stage of the EKLT performs a transform on first stage parameter (in this case, the PCA component weight) variations across slices. The objective of the process is to express these weight variations as concisely as possible.

Each second stage EKLT observation should describe the variation of a given first stage PCA component weight, across slices, for the four quarter slices, in order to allow a full-head reconstruction. Alternatively, a separate second stage EKLT observation can be created for the left and right side of the head. Each observation would contain the cross-slice variations of a given first stage PCA component weight for the corresponding (left or right) top and bottom quarter slices. This doubles the number of second stage EKLT observations and halves the number of variables for each of them, improving PCA performance.

The parameterisation of half-heads presents another, significant advan

tage looking ahead to the extraction of morphoacoustic mappings. It seems reasonable to suggest that the HRTF for a given ear can be calculated accu

rately by approximating the opposite side of the head to a perfect symmet

rical reflection of the side on which the ear lies about the median plane. If this is the case, information describing the shape of the opposite side of the head is irrelevant to the estimation of a monaural HRTF for the ear in ques

tion. Discarding this information would not, however, be an option should whole heads be parameterised using the EKLT. This could hinder the ex

traction of mappings between morphology and monaural HRTFs and, as a consequence, half-head parameterisation is preferred.

In order to achieve half-head parameterisation, Yi is re-arranged to form the second stage observations matrix X². Since there are 2N half-head ob

servations per component, if the first C\ most significant first stage EKLT principal components are sufficient to reconstruct Xi with satisfactory ac

curacy, there will be 2NC\ second stage EKLT observations in all. Each observation describes the variation, across slices, of the weight for a given first stage EKLT principal component for the top and bottom quarter slices which comprise a given half-head. This equates to a total of 2S variables per observation. X² will therefore have dimensions (25) x (2NC\). Fig 4.8 gives a visual representation of the rearrangement. Performing a PC A on X² gives the weights matrix Y² according to

Y2 = W² X2 (4.4)

where the columns of W² contain the principal components of X². Assum

ing the first C² components contained in W² are sufficient to reconstruct X² with satisfactory accuracy, then the entire shape data for a given half-head can be reconstructed with C\C-i parameters.

In document Human sound localisation cues and their relation to morphology (Page 161-168)