3.3 Characteristic features and Classification
3.3.2 More on Principal Component Analysis
The basic idea of principal component analysis (PCA) is to express a set of multi-dimensional data in terms of a few uncorrelated variables which are a linear combinations of the origi- nal variables. The two main objectives are: firstly, data reduction - the original variables are replaced by a smaller number of variables called principal components which represent most of the total variance in the data; and secondly, interpretation - the principle compo- nents often reveal relationships that could not be observed in the original data (Johnson & Wichern 1982).
The first principal component is a linear combination of the original variables that de- fines an axis with the maximum possible variance. Each successive principal component is uncorrelated with existing principal components and represents the largest remaining vari- ance. The effect is that most of the variance in the system can be represented in just a few principal components. The principal components depend entirely on the covariance matrix of the system and do not require the system to follow a multivariate normal distribution. To find the principal components of a given multivariate system with n variables denoted
CHAPTER 3. THEORETICAL BACKGROUND OF THE ANALYSIS 60
asx= [x1, x2...xn], the first step is to find a linear functionα10x, in terms of the elements
of x, that has maximum variance and takes the form,
y1 =α11x1+α12x2+α13x2+...+α1nxn. (3.20)
The next step is to find a second linear function α0
2x with maximum variance that is uncorrelated with the first linear function and so on (Jolliffe 1986).
These linear functions can be obtained by using the covariance matrix Σ to obtain the eigenvalue-eigenvector pairs ((λ1,e1),(λ2,e2)...(λn,en)). The set of n principal compo- nents is denoted by y= [y1, y2....yn] whereyi =e0i.x. Note that the varianceV ar(yi) =λi,
where λi are the eigenvalues of the covariance matrix Σ (Johnson & Wichern 1982). In
practice the origin is translated so that the mean along each principal component axis is zero. The direction (+ or -) of each principal component is arbitrary and depends on the algorithm used in the software implementing the PCA. Useful information about re- lationships between the original variables or the structure of the data is often revealed by studying the eigenvectors (loadings) of the principal components.
Since most of the variance for a given system is generally contained in the first few principal components, the data points in the system can be well described with a reduced number of variables. The number of principal components necessary to include a certain amount of the variance in the system can be determined from the eigenvalues which represent the variance of each principal component. It is important to note that it is not necessarily the higher variance PCs that give best separation of classes. We must bear in mind that by omitting lower variance PCs we may be throwing away information that is potentially valuable in the classification process. To avoid losing important data we should choose sufficient PCs so that a high percentage of the variance is retained.
Geometrically, the principal components represent a rotation of axes. The technique en- ables the data to be represented with a reduced number of variables but preserves the spacial location of the data. In the context of our experiment, the MDS trajectory paths representing the timbre of each sound remain unaltered in n-dimensional space but the axes are rotated around the points in order to assign most of the variance between data points to the first few axes (principal components).
Since principal components depend upon the scaling of the original variables, it is not uncommon to use the correlation matrix in preference to the covariance matrix. This
CHAPTER 3. THEORETICAL BACKGROUND OF THE ANALYSIS 61
effectively re-scales all variables to have a variance of unity thereby giving equal weighting to all variables. However this is not appropriate in our study for reasons related to the fact that the variables represent the set of all possible frequencies that may be present in a musical tone. The mix of frequencies present in a tone and the relative power of each frequency are the key factors which determine the timbre of a musical tone. To use the correlation matrix instead of the covariance matrix would effectively equalize the power of all harmonics present and, in so doing, seriously impair the ability of the trajectory paths to represent differences in timbre. An unwanted side effect would be that some variables contain low level noise which would be amplified to the same power level as the harmonics. It should be noted that the purpose of principal components is to spread the data along each PC on the basis of the whole data set and does not necessarily serve to separate data on a between class basis. Since this thesis is concerned with discriminating between classes it will be prudent to consider the techniques of linear discriminant analysis where the data is projected onto one or more linear discriminant axes for the purpose of dividing the whole data set on the basis of class.
CHAPTER 3. THEORETICAL BACKGROUND OF THE ANALYSIS 62