• No results found

4 Morpheme: a multidimensional sketching interface for the control of corpus-based

4.4 Issues in the first version of Morpheme

The first version implementation of Morpheme was more a proof of concept rather than a complete application. After discussion and informal trials of the software and the user interface with the supervisory team and colleagues, a number of issues were identified which driven the design and implementation of a second version. We could distinguish between two types of issues, those related to the design and others to the implementation. These are discussed below.

4.4.1 Design Issues

The mapping in the first version of Morpheme was devised using a very idiosyncratic approach. As I continued exploring literature related to the question of cross-modal mapping, I started to discover a plethora of studies that provided empirical evidence that suggested that there is a consistency in the audio and visual feature dimensions humans perceive as good correlates. It soon became obvious that the mapping of Morpheme must be based on the empirical findings rather than try to devise a new theoretical framework and a heuristic method. Further, it was also realised that similar empirical methods used in to study cross-modal correspondences and similarity between structural features can be extremely useful for the evaluation of multimodal interfaces for musical interaction amongst other human computer interface applications.

A second issue is related to the division of the canvas into five sections. The division of the canvas in the first version of Morpheme allowed to layer multiple sounds that it could be considered a desired feature. However, this feature was at the expense of usability. The main usability issue related to the division of the canvas was that the user had no way of determining the boundaries of each section of and consequently which part of the sketch will be analysed together. Informal trials showed that the division of the canvas is confusing for the user as it was not related to either the way we would naturally perceive a canvas or it follows any conventions derive from the use of other mainstream audio interfaces and music notation. Two options were considered in order to solve this problem: (i) perform feature extraction on the entire sketch but at a blob level and for each blob use a different audio-unit selection and synthesis module, and (ii) perform feature extraction on the entire sketch and use one audio-unit selection and synthesis module. The second option was considered the most appropriate, given that the other approaches could be very demanding in terms of computational resources due to the potential of large number of simultaneous queries to the database for audio-unit selection. The first option was also rejected because it was deemed too complicated to work at an object level. First there are discrepancies between what a computer recognises as a blob and what humans

70

recognise as objects, the mismatch between user and the computer’s object recognition abilities could lead to user confusion. Second, imagine the scenario were seven blobs are aligned on the vertical axis, this would mean that image analysis will have to be performed separately for each blob, which will result in target for the selection of the audio-units and seven sound synthesis modules will be required in order to playback the retrieved units.

4.4.2 Implementation Issues

Due to the fact that the present version of Morpheme uses a series of five retrieval algorithms and synthesis modules and a set of feature extraction algorithms one for each area of the canvas running the system is extremely expensive from a computationally point of view. This is obviously an issue that has to be addressed as real-time interaction with the system is the main aim. A number of modifications were identified that could improve the computational efficiency of the system including: (i) reduce the number of retrieval and feature extraction and sound synthesis algorithms, and (ii) change the brushes methods used for sketching as they are implemented in javascript which is slower than Cycling 74 jitters native objects/methods which are developed using the C programing language. Further, in the second version of Morpheme, it will be necessary to implement a mapping for associating the distance between the target and selected feature vectors to the sound synthesis parameters. For example, when a target loudness is -20db and the selected audio-unit is -3db the amplitude of the selected audio-unit should be decreased by 17db in order to match the target.

For the selection of audio units from the corpus, CataRT uses k-Nearest Neighbour (k- NN) algorithm. k-NN works by estimating the shortest distance between the feature vector of the target (e.g. visual features extracted from the canvas) and the feature values of the units stored in the database (for more information on the algorithm, see Schwarz, (2004). A common phenomenon when an audio corpus consists of a relatively small number of audio units is that the distribution of the audio descriptions in the feature space is concentrated, forming dense clusters in some areas while the rest of the feature space is relatively empty. One problem that can be observed is that the target features values requested might be well outside the main clusters of the feature space and the target might not match any of the audio units in the corpus. In such case, the k-Nearest Neighbour (k-NN) algorithm will select the nearest unit that can be found in the feature space. On the other hand if the target feature vectors in a series of queries are well outside the main clusters, the selection algorithm will stay in the periphery of the clusters and will not access the clusters. This results in the following problems: (i) although the corpus might consist

71

of many audio-units, it might be difficult to access the clusters, and (ii) two very different target queries (i.e. two brush strokes in the context of Morpheme), which both request feature vectors that are well outside the main clusters of the corpus, might retrieve the very same audio-unit. So in the new version of Morpheme a method should be devised to address this problem and improve the exploration of the audio corpus.

Finally, the mapping between audio and visual feature dimensions in the new version of Morpheme will be informed by empirically validated audio-visual correspondences. Consequently, the extraction of visual features will also need to be revised as algorithms used must be able to describe the required visual features.