Multi-dimensional Scaling using Physical Features

2.3 Research on timbre: 1960 onwards

2.3.5 Multi-dimensional Scaling using Physical Features

In 1966, Plomp and his colleagues investigated the relationship between the timbre of Dutch vowel sounds and the frequency spectrum. They studied 15 vowel sounds with 10 speakers giving 150 observations. The average spectrum for each observation was determined by using n successive frequency passbands which enabled each instance of a vowel sound to be represented as a point in an n-dimensional spectral space. Using a technique that was essentially principal component analysis they were able to represent 84% of the variance in 4 dimensions and at the same time achieve good separation of the vowel sounds. They

CHAPTER 2. FEATURES DEFINING TIMBRE 33

found that the first two dimensions corresponded to the formant frequencies of the vowels. In follow up work, Pols et al. (1969) carried out experiments to investigate the correlation between the perceptual and physical (spectral) space for 11 Dutch vowel sounds. An

n-dimensional spectral space was obtained for the vowel sounds by the method used in their earlier work and then reduced to three dimensions by means of principal component analysis. A three dimensional perceptual space was obtained using a triadic comparison procedure and the multidimensional scaling techniques described previously. When the perceptual and physical representations were compared it was found that there was very strong correlation between each corresponding axes. The close correspondence between perceptual space and the physical space was a finding that had important implications for future work in the study of timbre.

In work similar to the above, Plomp (1970, 1976) investigated the relationship between timbre and average spectrum for musical tones. He applied the triadic comparison procedure together with multi-dimensional scaling techniques to create a timbre (perceptual) space and used filtering techniques to create a spectral (physical) space for a set of tones. In both cases each tone was represented by a single point in space. To compare the timbre space with the spectral space he used a canonical matching process developed by Cliff (1966) in which the first three factors (dimensions) were compared. A very strong correlation was found between the timbral and spectral spaces along the factors I, II, III. He concluded that there was excellent agreement between the timbre space and the spectral space indicating that differences in timbre can be predicted from differences in frequency spectrum. This investigation was a precursor to work in the 1990’s on multi-dimensional scaling analysis of musical instrument spectra - for example, the work by Hourdin & Charbonneau (1997). In another similar investigation, Poli & Prandoni (1997) borrowed from techniques used in the speech processing community in order to analyse timbre and algorithmically define timbre space. Their goal was to produce a ‘sonoligical’ model that included analysis methods most appropriate to the sound characteristics to be studied and an effective data reduction process. Experiments were conducted with 21 instrument sounds generated with a synthe- siser. Time varying aspects were highlighted by using short overlapping data windows. The analysis involved the use of mel-frequency cepstral coefficients - a well used technique in speech analysis. Two approaches were used for data reduction and production of a timbre space. The first was a non-linear projection of the n-dimensional space onto a timbre space of reduced dimensions using Kohonen neural networks. The second was a linear projection

CHAPTER 2. FEATURES DEFINING TIMBRE 34

onto a space of lower dimensions using principal component analysis. (Note that their use of PCA differed from Plomp (1966) in that spectral data was collected from multiple windows whereas Plomp used the average spectrum.) In both transformations Euclidean distance was retained. The authors found that the mel-frequency cepstral coefficients were well suited to representing musical timbre. They found principal component analysis the most effective technique for creating timbre space. They found the first axis related to the spectral energy distribution (brightness), the second axis correlated with spectral energy across the whole frequency band for musical sounds, and the third correlated to energy in a narrow region of the spectrum around 700Hz. They concluded that the third principal component was a differentiating factor in the quality of musical timbre. Borrowing from the terminology of audio amplifiers, they referred to this characteristic as presence. They assert that these qualities of brightness and presence can easily be modified by instrument makers whereas temporal qualities found in the attack stage are fundamentally tied to the instrument structure. These temporal qualities are relatively constant within instrument classes and therefore offer key clues in instrument recognition. The authors conclude that the steady state portion of a musical tone is the key determinant of timbral quality whereas the attack phase is important in instrument recognition.

In an investigation focusing on both spectral and temporal aspects of timbre, Hourdin & Charbonneau (1997) take the general principles of multi-dimensional scaling (MDS) and apply them to a physical description of musical tones. The beginning point is a series of short data windows analysed with a hetrodyne filter to yield spectral-temporal data over the duration of each tone. This physical data is in contrast to the more subjective and perceptually based dissimilarity ratings used by Grey (1977). A further point of difference is that the standard MDS studies do not directly take account of temporal features. To represent the variance of the data in just a few variables the authors used factorial analysis of correspondences (FAC) in a similar way to principal component analysis. Each data point was then plotted in the geometric space created with the new variables. This enabled a dynamic representation of each tone via a closed discrete curve which started and finished with silence. To show that the shape of the curve was related to timbre they examined the effect of changes in duration and pitch on the shape of the curve by synthesising modified tones. It was not possible to examine the effect of changes in intensity since the filtering process normalised intensity. Results showed that changes in duration did not affect the trajectory for that instrument. However, by comparing tones at pitch C4 with tones at C3 artificially raised to C4, they showed that pitch did have a significant effect on

CHAPTER 2. FEATURES DEFINING TIMBRE 35

the trajectory. They concluded that the trajectory path for each tone incorporated both spectral and temporal features of a tone and gave a good representation of the timbre of that instrument - it represented a tone signature. The trajectory paths for a pair of tones therefore relate to the task of distinguishing between the timbre of those two tones. The authors attempted to interpret each of the first three principal components in terms of physical features in order to compare with the investigations by Grey (1977). The first principle component was thought to correspond to the energy level of the signal - a quantity not relevant to Grey’s study. The second principle component was thought to correspond to band width - similar to the first axis in Grey’s study. The third principle component was thought to correspond to balance of energy between the lower harmonics and the higher harmonics - an interpretation that did not accord with Grey’s second axis. They concluded that spectral width was the most important feature of timbre. The interpretations of the first two principle components accord reasonably well those of Plomp, Pols & van de Geer (1966).

In an extension of the work of Hourdin & Charbonneau (1997), Kaminskyj (1999) set out to use physical data together with data reduction methods to produce a geometric space that would adequately represent the timbre of musical tones in just a few dimensions. As in the Hourdin, study he took a series of short data windows but used the constant Q transform (Brown 1990) to extract the spectral-temporal data. To achieve an initial reduction in data Kaminskyj chose to use only the the frequency bins corresponding to, or adjacent to, the first 20 harmonics. A reduction in dimensionality was achieved by then applying principal component analysis. The first PC was interpreted as corresponding to energy level and band width; the second as the sum of the energy in the odd harmonics; and the third seemed to correspond to energy in the fundamental and odd harmonics. His intention was to use the 3-dimensional trajectories from MDS as a feature in a system for instrument identification.

CHAPTER 2. FEATURES DEFINING TIMBRE 36

In document COMPUTER RECOGNITION OF MUSICAL INSTRUMENTS: AN EXAMINATION OF WITHIN CLASS CLASSIFICATION (Page 44-48)