• No results found

3.9 Subjective response analysis

3.9.4 Textual responses

A potentially troublesome set of variables to categorise and quantify are the open response questions asking the participant to identify a positive and negative sound source within the environment. The issue of synonymy is a major hurdle in this analysis, as it describes the phenomenon where different words describe the same thing [139]. Another issue is Polysemy, where a singular word may have multiple but related meanings [140].

sponses from 77 participants [38], whereby psycholinguistic analysis of sponta- neous verbal descriptions was conducted to identify semantic categories of envi- ronmental sounds and relevant sound quality criteria for urban soundscapes. This technique however, is not viable for this project’s projected large scale data-set as it is performed manually and would be too larger job for a single person.

An alternative method of analysis for these responses is to employ Latent Se- mantic Analysis (LSA) [141]. This technique is an established method for auto- matically inferring the contextual similarity of words from a large collection of text descriptors. It is primarily utilised when dealing with large bodies of text to extract their semantic structure. The technique can be used alongside dynamic clus- tering to group descriptors based on their conceptual similarity using a Singular Value Decomposition Model [142]. Before this method can be implemented, the identification of similar terms for source descriptors must be performed. This pro- cess spots multiple uses of source identifiers, taking into account pluralisation and the harder to identify misspellings of these words. Pluralisations can be compen- sated for computationally using simple logic string comparisons. The detection of valid but misspelt sources requires a more in-depth process of extraction. Due to the method of entry on mobile devices, there are two main erroneous entries of sources that need to be compensated for: predictive text entry faults whereby the wrong word is entered due to the user allowing the phone’s dictionary to wrongly assume the desired word and straight spelling mistakes where the user simply enters the desired word but spelt wrong. Misspellings of words will be harder to catch and will require comparing them with a dictionary database. Each of these techniques will also require a list of potential source choices to compare to, made up of the existing set from the non-erroneous entries gathered from the project.

around single words and removing any special characters. A dictionary is then built up from the words and entries such as: “nothing”, “nowt” and blank entries, signifying that the participant has not identified a positive/negative source are re- placed with “none” to keep consistency. The responses are then converted to a Term Matrix using the Text to Matrix Generator [143], using Matlab. This generates a matrix of weightings for each of the dictionary entries, based on their prevalence within the response set. This algorithm also disregards common terms such as “the” and “a”.

The term matrix is then fed into a clustering process based on a principal di- rection divisive partitioning (PDDP) clustering algorithm using k-means [143]. This algorithm determines the distances between clusters by computing the Euclidean distance from every entry within a particular cluster to every other point in every other cluster. The number of clusters is progressively reduced as each entry is assigned a “scatter” value that denotes its distance from the clusters centroid. In this instance, this process served to categorise around 90% of all entries into a number of clusters. These clusters were then manually identified as either: mis- cellaneous sounds, human sounds, natural sounds or artificial sounds. There were also a small number of clusters containing terms which the algorithm could not classify as they were too obscure, in a foreign language or spelt wrong. These had to be manually classified.

Once each sound source is classified, investigations can be made into sound- scape response with respect to the type of positive/negative sounds identified. This will serve to determine the influence these source types have on appreciation and a persons perception of the soundscape. The prominence rating associated with each sound source will also help to uncover the strength of this influence as a function of their perceived prominence within the soundscape. This broad cat-

egorisation may serve to over generalise certain sound types, but will allow the project’s findings to be compared to those from past research.

Using a more standard implementation of the LSA technique, the open re- sponses to the question: “Why did you record this soundscape?”, submitted by participants will also be analysed, providing an insight into the reasons why the user chose a particular soundscape to record. This information (captured at the online upload stage & on iOS devices) is limited to 8000 characters with the ques- tion asked of the participant: “Why did you choose to record this soundscape”.