User Feedback - Automatic sound synthesizer programming: techniques andapplications

6.3 Analysis

6.3.2 User Feedback

Case Study: the BBC Experiment

Improvising with (or against) a computer algorithm is significantly different from improvising with another human being. It is hard to guess what the computer will generate. The results are often surprising and can lead the improviser into very unusual territory (tonally and harmonically in the case of

the BBC experiment.) Human logic and algorithmic/computer logic feel very

different. Finn Peters, 2010 [95].

The musician who has improvised with the system the most is saxophonist, flautist and composer Finn Peters. Finn has a varied background, ranging from his work with contem-

porary classical group Nosferatu, through extensive jazz experience2and more mainstream

work with the likes of Sam Rivers [90,27]. The above quote was provided by Finn in re-

sponse to the author’s request for feedback on a duet recorded for BBC Radio 3’s ‘jazz on 3’ programme in 2009, featuring Finn on flute and unattended algorithm. The duet is

available on the CD included with this thesis (appendix A).

For this session, the timbral output of the improviser was deliberately constrained to mild variations of a toned percussion-like sound, as Finn wanted to play a fairly melodic piece. The algorithm operator (the author) started the program then left the machine completely alone for the duration of the piece. Finn initially threaded a gentle mixture of long notes, deliberately leading the algorithm to a similar output. This continued for a while until the algorithm began to reprocess the notes in a sufficiently interesting way such that it demanded that Finn responded to it. Then the line between Finn’s input and the machine’s began to blur. The piece has an interesting and unpredictable ebb and flow until it reaches a natural end point, when the author quickly stopped the algorithm. This

session was listed as one of the best live sessions recorded for that programme in 2009 [28].

This success is rather surprising - a completely automated program was able to compete with jazz musicians from around the world and to produce output of sufficient quality to be placed on this list. The author feels this can be considered a great success. Linking back to Bown and Lexer’s work where they discuss the idea that a skilled improviser can guide a much simpler algorithm through its state space, how much of this success should

be attributed to Finn and how much to the algorithm design [11]? Finn states in the

quote that [the algorithm] ‘can lead the improviser into very unusual territory (tonally and harmonically...)’, which begs the question - who was leading whom? Whilst the human player clearly carefully selected notes they knew would be likely to produce good results, the re-arrangement of those notes is a joint effort. This is Young’s unimagined property of live algorithms, where they might produce “a living computer music, an unimagined

music.” [145].

His band, which now includes the author and associated algorithmic improvisers was rated as the BBC best jazz band 2007

The Laptop Performer’s Perspective

From the perspective of the laptop operator, the use of live algorithms is an interesting development from the more usual modes of laptop performance such as the manipulation of effects, samples and synthesizers. In this recording session, the author deliberately pressed the start button for the algorithm in an exaggerated fashion and stepped away from the machine, to emphasise its autonomy. As an approach to live performance, this might seem a bit limiting; so how might an operator interact more effectively with a live algorithm? Live coding techniques allow the laptop musician to engage in a native way with a live al-

gorithm, by altering its processes at the logical, code level [22,82,89]. An example might

be the deliberate adjustment of the weighting used to vary the algorithm’s behaviour or the on the fly adjustment of genetic algorithm parameters. Indeed, the algorithm itself might be completely re-programmed on the fly. A typical group performance scenario for the author would be to improvise by running several different algorithms simultaneously, where each is fed from a different member of the group. The laptop performer can then choose which algorithm(s) is heard based on which is currently displaying the most interesting behaviour. Interesting behaviour might be for example, highly dynamic behaviour, producing output which effectively adds to the current mood or highly contrasting output which might lead the overall improvisation in a new direction. Indeed, the performer might allow all algorithms to run simultaneously, feeding back into each other, after Lewis’

‘co-dominance’ method of improvisation [71].

Further Insights

Finn has provided more insights informally. He reports a ‘chicken and egg’ sensation where over time, the line between what he has played and what the software has played becomes blurred. He is responding to it responding to him responding to it ... Finn is clearly able to distinguish between, to respond to and even to induce the different modes of the improviser. For example, at the beginning of an improvisation he might play long, tonally proximal notes to initialise the memory and induce slow, polyphonic playing, then later on he might play rapid, varied notes to trigger the same on the part of the algorithm. Indeed, Finn reports that part of the challenge and enjoyment when playing with such systems is to try to figure out what the behaviour actually is. This connects directly to Young’s opaque and intimacy attributes - algorithm design is a subtle blend of opacity and intimacy, where the human improviser feels they are sharing a space with the algorithm but it is not simply parroting their movements in that space; instead it has an element of

mystery or unpredictability.

In terms of the specifics of the sound synthesis and timbre matching, Finn found that a completely free timbre search was quite challenging to play against, preferring using an FM synthesis algorithm which had the frequency ratios quantised to increase the harmonicity of the tones. Finn did not request this specific technical adjustment, that was the author’s interpretation of how to fix this problem. It is possible that this situation would be less problematic if the system were optimised for faster evolution. If tone matching could be achieved in near real time, the musician would have more control when guiding the tonality of the overall piece. The limitation of a requirement for harmonicity is not such a problem as a large range of harmonic timbres are available. Perhaps the problem is that it is more challenging to work expressively in series of timbres than series of notes and this situation is exasperated if the timbre matching is too slow.

The system has inspired Finn to embark on compositional projects where algorithms have been much more involved. The ‘Music of the Mind’ project looked at different ways of mapping brain waves into the musical domain as well as ways of modelling musical behaviours. The project included the improviser described here as well as subsequent designs using Markov chains, where the note sequences were analysed in a Markovian fashion to elicit their statistical interval and rhythmic relationships. On the ‘Music of the Mind’ album, the improviser can be heard towards the end of the track ‘Sleep Music II’, half way through the track ‘Oxygen’ and, possibly the best example, at the end of the track ‘Brain Solo’, where it improvises a melody using an evolved saxophone-like sound played using an FM synthesizer. A Markov chain driven rhythm synthesizer can be heard

during the track ‘Brain Solo’ [96].

6.4 Conclusion

A creative application of genetic algorithm driven tone matching has been presented. The resulting system is an autonomous improviser which is designed to play alongside human musicians. The four parts constituting the improviser have been described in detail, namely, the synthesis engine, the tone matching algorithm, the listener and the improviser. The demands placed on the GA design by the moving target problem have been discussed and the mutation rate has been highlighted as the key algorithmic parameter here. The system has been evaluated mainly on its artistic merit, since it includes components which have been technically assessed in previous chapters. The properties of the system such as its variable polyphony, octave ranges and note speed provide it with a wide melodic

and rhythmic scope which affords an intriguing behaviour for human musicians. It can carry out intimate playing and less predictable playing, with its use of short and long term memories. The ability to change the synthesis engine allows it to be easily adapted for different scenarios. The system has proved to be a successful ‘musician’, being invited to

perform alongside leading musicians from the UK jazz scene 3.

Chapter 7

Conclusion

In this final chapter, the thesis is drawn to a close. The solutions to problems and answers to questions posed in the introduction are provided from the work in the thesis where

possible; unanswered questions are also highlighted. The contributions made by this

thesis are listed and references are made to the specific points in thesis at which they were made. Directions for future work are outlined, dealing with the work from each chapter in turn. Finally, there is a summary.

7.1 Research Questions

In this section, the problems and questions posed in the introduction are revisited; solutions and answers are derived from the results reported in the thesis.

Problem 1: Numerical Representations of Timbre

An analysis of the ascending auditory pathway suggests distinct properties of an ideal numerical representation of timbre. It should be multidimensional, aware of non-linear features of human hearing and capable of differentiating typical variations such as bright- ness and nasality. If this measure is to be used for efficient timbral comparisons, it should also be orthogonal within itself, such that it is compact. Lastly, it should be able to change gradually with variations in timbre. It has been shown that the MFCC fits with these criteria: it is multidimensional, perceptually weighted (with its use of the Mel filter bank and logarithmic power readings), capable of measuring typical timbral attributes (as

evidenced by the similarity of the MDS plots shown in figures 2.1and2.9, orthogonal (as

a result of the DCT stage) and finally its values vary gradually as the timbre changes

Terasawa et al [121]). The successful application of the MFCC to speech recognition and timbre classification problems supports its computational viability. Open source MFCC extractors are available and their implementations, whilst they vary in certain details, gen- erally produce similar output. A novel, reference implementation has been produced for ease of programmatic integration and as a means to fully understand the computational requirements.

Problem 2: Effective Sound Synthesizer Programming

A user trial was carried out which aimed to compare the synthesizer programming abilities

and behaviour of human and machine (chapter4). In the first regard, the comparison of

abilities, the genetic algorithm tone matching technique used in the SynthBot software proved to be a superior synthesizer programmer than the expert users involved in the trial, at least as measured by the MFCC error metric. The human programmers were able to program the subtractive synthesizer better than the FM synthesizer, achieving scores which were within 50% of the machine in the former and an order of magnitude worse than the machine in the latter case. Perhaps this supports the widely held opinion that analog-style synthesizers are more intuitive to program than FM synthesizers. The humans were able to program sounds which were reminiscent of the target sounds in most cases, evidenced by the author listening to the sounds, but this perceptual similarity has not been experimentally validated. An examination of the programming behaviour of the humans showed that they tend to work with a single parameter at a time, exploring its full range before moving to another. It seems that this strategy will become ineffective once the number of parameters and the complexity of the synthesizer is too high. Also, this will not work well for parameters which are highly interdependent as a parameter thought fully explored might effectively change its function whilst another parameter is being explored. It is worth noting that not all musicians who use synthesizers will have anywhere near the skill of those involved in the trial and that even within this skilled group, there was a wide variation in programming success. The best scores, especially with the subtractive synthesizer, were impressive but the average scores were much less so. In conclusion, human programmers seem to be able to program synthesizers with greater precision than expected and are able to find near optimal settings in some cases, given a synthesizer that is regarded as ‘intuitive’. It is not clear that this programming success extends to cover more complex synthesizers however and not all synthesizer users are likely to have such a skill level. There is clearly a need for more effective ways to

program synthesizers if the average user is to escape from the presets.

Problem 3: Mapping and Describing Sound Synthesis Space

A detailed description of the effect of parametric variations upon the output of various

sound synthesis algorithms has been provided in chapter3. The concept of a timbral error

surface was introduced wherein the effect of parametric variation is presented as movement

in timbre space away from a reference point (subsection3.2.1). These error surfaces are a

way of mapping sound synthesis space which can provide an insight into the best method to use to search this space algorithmically. They also suggest appropriate resolutions for sampling sound synthesis space and illustrate the timbral behaviour of different types of parameter, e.g. quantised and continuous parameters.

The SoundExplorer tool is proffered as a system which provides its user with a dimen- sionally reduced map showing the timbre space of a sound synthesis algorithm or a set

of audio files (section 2.5). Users can interactively explore the map, by playing back the

audio files or synthesizing the sounds in real time with parametric interpolation from one position to the next. Whilst the tool has not been formally trialled, e.g. using a ‘find a sound’ task, the initial reaction by users is that it is very exciting to be able to see all of the sounds a synthesizer can make in a single window. It also provides a way of quickly viewing the intersection between the sound space of different synthesizers or synthesizers and real instruments and so on. With error surfaces and SoundExplorer, it is possible to examine the details of the surface of sound synthesis space and to present an overview of the complete space.

Problem 4: Searching Sound Synthesis Space or Automated Synthesizer Programming

An extensive comparison of 4 methods through which sound synthesis space might be

automatically explored was presented in chapter 3. The 4 methods were feed forward

neural networks, genetic algorithms, hill climbing and a data driven approach. The aim was to search for sounds which were known to reside in the space and for real instrument sounds, where their residence in the space was an unknown. It was shown that the genetic algorithm carried out the highest quality search but the hill climbing and data driven approaches were also quite effective. The neural network showed very poor performance, failing to learn a mapping between MFCC feature frames and synthesizer parameter settings. This is unfortunate as the neural network is the only one of these methods that could

operate in real time, compared to the 60 seconds or more required by the other methods to match a timbre. A hybrid data driven/ genetic algorithm approach is presented and shown to have superior performance than either of its individual parts. It is felt that with optimisation of the data driven approach, this could provide much faster tone matching. Fixed architecture FM synthesis, modelled after the Yamaha DX7, offered the best tone matching performance over the real instrument data set. The poor ability of the genetic algorithm and other approaches to program the variable architecture FM synthesis algorithm was quite disappointing as it was felt that this synthesis algorithm should offer the best range of sounds and is also perhaps the hardest for a human to program. Further, in previous work by the author which partly inspired this thesis, a variable architecture synthesis algorithm proved itself capable of generating a huge variety of sounds when placed

in an interactive GA context [136]. The topic in general warrants much further study,

both in terms of optimisation performance and synthesis architectures. The ultimate goal is to produce a real time tone matcher that can work with any sound synthesis algorithm.

This possibility is discussed further in section 7.3.

Problem 5: Creative Applications of the Above

Four creative applications of sound synthesis space search have been described. Sound- Explorer is a synthesizer programming interface that does away with the usual ideas of categories and presets and instead presents the user with an intriguing image showing a

large range of timbres that the synthesizer can make (section2.5). The user can zoom into

the image to explore heavily populated parts of the space and they can combine different sound sets, e.g. two synthesizers or a synthesizer and a set of samples from a sample library. The system provides a form of parametric interpolation which allows the user to hear the sound of the space between sounds. Formal user trials would inform the future development of this project, but as it stands it is an interesting tool to interact with, which provides a means to aggressively explore sound synthesis space. SynthBot aims to allow its users to escape from the presets and to specify the sound they want directly, by example,

in the form of an audio file (chapter 4). User trials pitting SynthBot against human pro-

grammers showed that SynthBot is a very effective automated synthesizer programmer; as a studio tool, it is the equivalent or better of having an expert human synthesizer programmer at your disposal. The process of creating a commercial implementation of this system is ongoing. The Evolving Drum Machine attempts to re-invigorate the venerable drum machine by providing a means of generating timbral transitions for the percussion

sounds (chapter 5). This is an attempt to take the idea of complete timbral control from electroacoustic music and bring it to the drum machine, a fixed timbred stalwart of ‘pop- ular’ electronic music. The system was trialled with two experienced drum machine users and the response was very positive. They engaged conceptually with the idea of evolving timbres but did state the importance of retaining some sort of editorial control. The final

application was the timbre matching improviser, presented in chapter 6. This system has

been used extensively for live recordings and has appeared on national radio, where it

In document Automatic sound synthesizer programming: techniques and applications (Page 151-180)