One of the propositions in this thesis is the use of evolutionary computing to address intelligent music production challenges. A brief review of literature in this topic is therefore required. Evo- lutionary computing (EC) is a broad term referring to a series of algorithms and analysis methods often used in global optimisation problems. They are so-called as they utilise systems in which a population consisting of multiple potential solutions changes over time,evolvingtowards the opti- mal point in the solution space. Generally, this is achieved using some meta-heuristic derived from biological processes, noting that living systems have evolved towards optimal solutions to specific problems, such as adapting their collective behaviour in order to survive in new landscapes. This is in contrast with more traditional optimisation strategies which iterate one solution over the solu- tion space. These methods include gradient-based or “hill-climbing” methods which require that the solution space be smooth and differentiable. EC is commonly used for problems which are non-deterministic, or non-linear, where the solution space may not be smooth and differentiable, such as logistics, scheduling, engineering and design.
2.4.1 Genetic algorithm
There is a great variety of biologically-inspired meta-heuristics used in optimisation. Perhaps the most well-known is the genetic algorithm (GA). The contemporary understanding of what constitutes a genetic algorithm owes much to the works of Holland [83] and Goldberg et al. [84], among other authors.
A genetic algorithm begins with an initial population of candidate solutions, which evolve
towards an optimal solution in a manner akin to genetic evolution using Darwinian principles, par- ticularly “survival of the fittest”. Each solution is represented as a chromosome, a list of ordered genes. For binary GA, each gene is a value in the allele set{0,1}. For example, the chromosome [0,1,1,0]contains four genes. The dimensions of the problem to be solved are represented within this chromosome, i.e. ifx andycoordinates in a fixed range can each be represented as a 4-bit binary string, then the 8-bit chromosome[1,1,1,1,0,0,0,0]represents a maximal value of xand a minimal value ofy. Within the population of solutions, each is rated according to its quality as a solution. This is known as a fitness function. Individuals with fit solutions are chosen more frequently to “mate” with other fit solutions, thus producing offspring in the next generation of solutions. Done correctly, this allows the average fitness of the population to increase, converg- ing on the optimal solution. The basic form of a genetic algorithm can be described as follows, illustrated in Figure 2.10.
1. Initialise population
2. Representation of population as chromosomes
3. Evaluation of population ‘fitness’
4. Selection of fittest individuals for reproduction
5. Reproduction by genetic crossover and mutation
2.4. EVOLUTIONARY COMPUTING 29 Population Initialisation Termination Parents Offspring Recombination Mutation Survivor selection Parents selection
Figure 2.10:Flowchart of a basic Genetic Algorithm
For the sake of brevity at this point, further explanation of selection, crossover and mutation will be discussed in Chapters 7 and 8, within the context of the specific problem at hand.
2.4.2 Interactive Evolutionary Computation (IEC)
In problems that are highly subjective, EC methods are particularly suitable. IEC is a form of EC in which the fitness evaluation is not based on a clearly defined formula but on the subjective response of a user. IEC has been utilised in the solution of various problems which are subjective, such as fashion design [85], logo design [86] and sound design (see Takagi [87] for a detailed overview of applications). Notably, these examples all incorporate design problems in which aesthetics are important. In such applications relating to aesthetic design, there may not be a clearly defined optimal solution that is considered suitable for a range of users. Neither is the fitness landscape clearly defined. The fitness function depends greatly on what is asked of the user conducting the evaluation and their understanding of the question posed and the domain of the problem. For example, in the case of fashion design, users may be asked to rate the fitness of presented candidate solutions (outfits) where the target is a series of descriptions such as “warm, smart, casual, autumnal” etc. IEC is useful here since when attempting to solve such a problem
“...we cannot use the gradient information of our mental psychological space...” [87].
2.4.3 Specific challenges of IEC
In IEC, the system generates solutions in the problems parameter space while the user evaluates the fitness of the solution in some psychological space, which may be unique to each user. The mapping between these two spaces may not be well-defined.
Considering that in IEC a user must evaluate the fitness of each solution, this can become a time-consuming activity, with potential for high levels of cognitive demand and eventual fatigue. This is especially a problem in audio, where each individual solution may take tens of seconds to evaluate, rather than in image evaluation, where a number of solutions can be compared side-by- side. In parallel to the emergence of IEC has been the development of hybrid methods in which a relatively small number of solutions is evaluated by the user and the remaining solutions are evaluated by extrapolation. This reduces the burden on the user for problem types where large populations are helpful. Approaches to reducing the user burden include clustering of solutions [88] and alternating user-evaluated generations with computer-evaluated generations.
2.4. EVOLUTIONARY COMPUTING 30
Figure 2.11:Psychological distance between target in our psychological spaces and actual sys- tem output become the fitness axis of a feature parameter space where EC searches for the global
optimum in an IEC system. Image taken from Takagi [87].
2.4.4 Suitability of EC to IMP problems
This thesis proposes that there exists a strong argument as to why EC is well-suited to IMP prob- lems. This argument is based on the following.
Non-linearities — due to the perceptual nature of audio evaluation, the solution space may not be smooth and differentiable, making optimisation methods such as gradient descent difficult or impossible to apply. Additionally, as each user may have a different goal in mind, there may not exist a single global optimum. Each user may perceive a “personal global optimum” rather than every user agreeing on a “universalglobal optimum”.
Large number of parameters — often there are a large number of parameters where the re- lationships between them are not well-understood. Furthering the understanding of these relationships helps construct more efficient search spaces. It is also important to establish the mapping between system parameters and perceptual factors.
Fitness functions — the definition of a “good” mix, or at least a desired mix, can be complex but is ultimately subjective. What is required is a numerical value for fitness. Quantities to be minimised include the distance to a desired target which is known in advance, or quantities thought to degrade audio quality such as inter-channel masking [66, 89]. However, if per- ceptual targets are being sought, such as “warmth” or “clarity”, explicit subjective ratings can be used as a fitness function in place of a numerical approximation.
A synthesis of these three observations leads to the use of Interactive Evolutionary Computing. If “quality” is the variable to be optimised one must appreciate that quality can be considered as specific to a single product, good or service [7]. Recall the framework for quality proposed by Reeves and Bednar [1], repeated below. While definition #3 could possibly lead to an objective fitness function, the other perspectives suggest subjective evaluation, furthering the case for using IEC.
1. Quality as excellence of superiority
2.4. EVOLUTIONARY COMPUTING 31
3. Quality as conforming to specifications
4. Quality as meeting or exceeding customer expectations
Many of the works in Table 2.3 were aimed at live-sound applications, i.e., real-time processing of incoming audio streams without prior knowledge, analysis of extracted features, heuristics used to guide optimisation etc. An EC-based approach may be more suited to studio environments, where processing is often applied after audio has been recorded, where there exists the time and the possibility to compare various processing decisions before arriving at the final settings. Here, there is no longer a need to analyse “live” audio as the entire audio track is known. Importantly, multiple audio tracks are known as are the relationships between them. This scenario increasingly allows for cross-adaptive effects and the temporal variation of parameters.
2.4.5 Previous work on EC in IMP
Much of the earliest applications of EC to this area are in subjects that may not be considered as intelligent music production in the modern context, but do relate to audio/acoustic engineering applications, such as filter optimisation in non-musical applications [90, 91], acoustic designs [92, 93] and binaural hearing [94, 95]. Synthesis and/or sound design is perhaps the area that has made most use out of EC-based techniques, where the parameter space of a synthesis engine is searched for optimal sounds [96–101].
Many of these prior works are based on matching a sound or mix to a target, using the distance from the target as a fitness function to be minimised. Of course, this target must be known in advance. Heise et al. [73] compared four techniques (including genetic algorithm and particle swarm optimisation) in the task of adjusting the parameters of a reverberation plug-in to best match a given room impulse response. Kolasinski [102] was concerned with matching a mix to a target, by adjusting tracks gains and using the Euclidean distance between spectral histograms as a similarity measure that was to be minimised using GA. Barchiesi and Reiss [74] also attempted matching to a given target mix, by optimising track gains and track EQ filters, using least-squares. This paper was critical of Kolasinski [102] and of GA in general for this application, stating
“... for the purpose of this application, the results are quite poor as the number of
tracks increases and the algorithm is computationally expensive”.
These performance issues may not have been due to high-dimensionalityper se, but rather the choice of an inefficient solution space. Chapter 4 shows that optimisation of track gains and EQ filters benefits from carefully designed solution spaces, in which each possible configuration exists only once. Additionally, computational expense is less of a problem now than in 2009. There are many more papers on various “matching to a target” applications [103–106]. What about when there is no target audio available? In place of explicit target audio there may still exist a target in some other domain, such as a perceptual target (“Make the mix sound bright/warm...etc”). Reed [49], while not using EC, does emphasise that IMP applications should be “assistants” rather than replacing the human operator. This is a philosophy that has been echoed by others [50–52] and is applied in this thesis.
2.5. SUMMARY OF LITERATURE REVIEW 32