Soundscape classification and analysis - Application of mobile and internet technologies for th

The individual sounds that make up a soundscape are generally segregated into categories to define the type of source that they are [23]. The most common terms

used in this reductionist approach are “natural”, “human” and “artificial” sounds, which have been used spontaneously by listeners in previous studies [38, 39] and identified by researchers [35], with examples shown in Figure 2.2. For these terms to be a representative definition of the source, contextual and visual information may be necessary, as certain sounds can be perceived differently but exhibit similar acoustic properties (e.g. motorway and waterfall).

categories

natural

_e.g.

human

artificial

birdsong water e.g. voices footsteps e.g. traffic air-con

Figure 2.2: Sound source categorisation examples

Another approach to sound source classification utilised discourse analysis of participant’s responses to perceived low and high frequency sounds [40]. This de- termined two types of category. Recognisable sound sources were categorised as source events, while indistinguishable sources were categorised as background sound. One approach to automatic pattern recognition of audio signals represents signals as the long-term statistical distribution of feature vectors [41]. This method attempts to uncover the perceptive saliency of sound events based on their statistical typicality within a soundscape recording. The resultant output of the technique is a collection of “classes” containing soundscapes of a similar (human defined) type, e.g. avenue, park, urban street etc. This has been named the “bag-of- frames” (BOF) approach, an analogy to the “bag-of-words” (BOW) analysis of text data as a distribution of word occurrences without retaining their organization or

context within phrases, traditionally used in textual data mining [42]. The signal to be analysed is cut into short overlapping frames (typically 50ms in length with a 50% overlap), and for each frame, a feature vector is computed. The features generally consist of a generic, all-purpose spectral representation such as the Mel Frequency Cepstrum Coefficients (MFCC) [43]. These feature vectors are fed to a classifier based on a Gaussian Mixture Model [44] which models the distributions of these features and assigns these distributions to varying classes. Distributions for each class can then be used to compute decision boundaries between classes. When a number of these classes have been established by the model, a new soundscape recording is classified by computing its feature vectors, finding the most probable class for each of them, and taking the overall most represented class for the whole recording. This approach has proved effective for soundscape classification. Classification precisions of 91% have been reported on a dataset of 80 3-second sound extracts from 10 everyday soundscape classes (street, factory, football game, etc.) [45]. This technique is intended only to simulate (very crudely) the outcome of human cognitive modelling of a sound environment.

A study of eight different streets in a Japanese city used 11 semantic differential scales to rate the soundscape of these locations [46]. The results of the study were analysed using cluster analysis. This method revealed three different types of soundscape that comprised 1) large amounts of vehicle and human activity, 2) mixtures of human, transport and natural elements, and 3) mostly natural elements with few vehicle or human activity. The study showed that the relationships between source types can give rise to differing soundscape categories based on responses to quantitative, semantic scale based question sets.

A number of researchers have employed the technique of factor analysis to characterise soundscapes based on semantic differential scales. In one partic-

ular study [6], eighteen, seven point bipolar rating scales were used to evaluate how people perceived a selection of urban open soundscapes. Some of these scales were based on previous research relating to urban soundscape as well as product sound quality [47, 48, 49], and some were compiled for the study which applied to the soundscapes under investigation. Passing members of the public (N=491) were interviewed in two urban locations across all seasons and at different times of the day. Varimax rotated principal component analysis (PCA) was used to extract the orthogonal factors underlying these eighteen adjective scales. Four factors were extracted in the analysis. The first, accounting for 26% of the explained variance was mainly associated with relaxation, including the scales: comfort-discomfort, quiet-noisy, pleasant-unpleasant, natural-artificial, like-dislike and gentle-harsh. The second, accounting for 12% of explained variance was associated with communication, including: social-unsocial, meaningful- meaningless, calming-agitating and smooth-rough. The third, accounting for 8%, was mostly associated with spatiality, including: varied-simple, echoed-deadly and far-close. The fourth, accounting for 7% was principally related to soundscape dy- namics, including: hard-soft and fast-slow.

Kang states that these four factors cover the main considerations of the acoustic design of urban public spaces: function (relaxation and communication), space and time. These four factors however, only cover 53% of total explained variance, which is notably lower than results from sound quality and environmental noise evaluations [49, 50, 51]. Kang suggests that this low value may be due to the complexity and diversity of soundscapes, with its inhabitants not fully understand- ing the presented terminology to describe it.

It is apparent from existing literature that soundscapes are considered to be complex entities made up of a large number of interrelated variables. Principal

Component Analysis allows us to perform a linear transformation that maps data from a high dimensional space to a lower dimensional space. The main drawback of using the PCA technique stems from its main assumption that the relationships among variables are assumed to be linear. The complex interactions between the numerous variables associated with soundscapes may require something more advanced than a linear model to explain. Nonlinear PCA (NPCA) is a viable al- ternative to the traditional approach as it is more robust to data based on ordinal scales, which may exhibit nonlinear interactions, as opposed to continuous data which is more suited to classical PCA [52, 53]. An investigation of this improved suitability for ordinal data was carried out in [54], showing a slight improvement over PCA when a larger sample size is used.

In document Application of mobile and internet technologies for the investigation of human relationships with soundscapes (Page 39-43)