A Computer Model for the
Schillinger System of Musical
Composition
Matthew Rankin
A thesis submitted in partial fulfillment of the degree of
Bachelor of Science (Honours) at
The Department of Computer Science
Australian National University
Except where otherwise indicated, this thesis is my own original work.
Matthew Rankin 28 August 2012
Acknowledgements
The author wishes to sincerely thank Dr. Henry Gardner for his extremely valuable assistance, insight and encouragement; Dr. Ben Swift also for his continuous encour-agement and academic mentorship; Jim Cotter for igniting what was a smouldering interest in algorithmic composition and more recently providing participants for the listening experiment; and Mia for her unyielding, belligerent optimism.
Abstract
A system for the automated composition of music utilising the procedures of Joseph Schillinger has been constructed. Schillinger was a well-known music theorist and composition teacher in New York between the first and second World Wars who de-veloped a formalism later published as The Schillinger System of Musical Composition [Schillinger 1978]. In the past the theories contained in these volumes have generally not been treated in a sufficiently rigorous fashion to enable the automatic genera-tion of music, partly because they contain mathematical errors, notagenera-tional inconsis-tencies and elements of ‘pseudo-science’ [Backus 1960]. This thesis presents ways of resolving these issues and a computer system which can generate compositions using Schillinger’s formalism. By means of the analysis of data gathered from a rigorous listening survey and the results from an automatic genre classifier, the output of the system has been validated as possessing intrinsic musical merit and containing a rea-sonable degree of stylistic diversity within the broad categories of Jazz and Western Classical music. These results are encouraging, and warrant further development of the software into a flexible tool for composers and content creators.
Contents
Acknowledgements v
Abstract vii
1 Background 1
1.1 Introduction . . . 1
1.2 Introduction to the Schillinger System . . . 2
1.2.1 Schillinger in Computer-aided Composition Literature . . . 3
1.2.2 Motivation . . . 3
1.2.3 Criticism . . . 4
1.3 Summary of this Thesis . . . 5
2 Overview of Computer-aided Composition 7 2.1 Dominant Paradigms in Computer-aided Composition . . . 9
2.1.1 Style Imitation versus Genuine Composition . . . 9
2.1.2 Push-button versus Interactive . . . 10
2.1.3 Data-driven versus Knowledge-engineered . . . 11
2.1.4 Musical Domain Knowledge versus Emergent Behaviour . . . 12
2.2 Formal Computational Approaches . . . 12
2.2.1 Markov Models . . . 13
2.2.2 Artificial Neural Networks . . . 15
2.2.3 Generative Grammars and Finite State Automata . . . 16
2.2.4 Case-based Reasoning and Fuzzy Logic . . . 18
2.2.5 Evolutionary Algorithms . . . 20
2.2.6 Chaos and Fractals . . . 22
2.2.7 Cellular Automata . . . 24
2.2.8 Swarm Algorithms . . . 26
2.3 The Automated Schillinger System in Context . . . 27
3 Implementation of the Schillinger System 29 3.1 Introduction . . . 29
3.1.1 A Brief Refresher . . . 30
3.1.2 The Impromptu Environment . . . 31
3.2 Theory of Rhythm . . . 32
3.2.1 Rhythms from Interference Patterns . . . 32
3.2.2 Synchronisation of Multiple Patterns . . . 34
3.2.3 Extending Rhythmic Material Using Permutations . . . 34
3.2.4 Rhythms from Algebraic Expansion . . . 35
3.3 Theory of Pitch Scales . . . 35
3.3.1 Flat and Symmetric Scales . . . 36
3.3.2 Tonal Expansions . . . 37
3.3.3 Nearest-Tone voice-leading . . . 38
3.3.4 Deriving Simple Harmonic Progressions From Symmetric Scales 40 3.4 Variations of Music by Means of Geometrical Progression . . . 41
3.4.1 Geometric Inversion and Expansion . . . 41
3.4.2 Splicing Harmonies Using Inversion . . . 43
3.5 Theory of Melody . . . 44
3.5.1 The Axes of Melody . . . 44
3.5.2 Superimposition of Rhythm and Pitch on Axes . . . 46
3.5.3 Types of Motion Around the Axes . . . 48
3.5.4 Building Melodic Compositions . . . 52
3.6 Structure of the Automated Schillinger System . . . 54
3.6.1 Rhythm Generators . . . 56
3.6.2 Harmonic and Melodic Modules . . . 58
3.6.3 Parameter Settings . . . 61
3.7 Parts of Schillinger’s Theories Not Utilised . . . 62
3.8 Discussion . . . 65
4 Results and Evaluation 67 4.1 Introduction . . . 67
4.2 Common Methods of Evaluation . . . 67
4.3 Automated Schillinger System Output . . . 68
4.4 Assessing Stylistic Diversity . . . 70
4.4.1 Overview of Automated Genre Classification . . . 71
4.4.2 Choice of Software . . . 73
4.4.3 Classification Experiment . . . 73
4.4.4 Preparation of MIDI files . . . 74
4.4.5 Classifier Configuration . . . 75
4.4.6 Classification Results . . . 76
4.5 Assessing Musical Merit . . . 78
4.5.1 Listening Survey Design . . . 78
4.5.2 Listening Experiment . . . 81
4.5.3 Quantitative Analysis and Results . . . 81
4.5.4 Qualitative Analysis . . . 86
4.5.4.1 Methodology . . . 86
4.5.4.2 Analysis and Results . . . 87
4.5.4.3 Genre and Style . . . 91
Contents xi
5 Conclusion 95
5.1 Summary of Contribution . . . 95
5.2 Avenues for Future Work . . . 97
A Samples of Output 99 A.1 Harmony #1 . . . 99 A.2 Harmony #2 . . . 99 A.3 Harmony #3 . . . 100 A.4 Melody #1 . . . 100 A.5 Melody #2 . . . 101 A.6 Melody #3 . . . 101 B Listening Survey 103 C Function List 113 C.1 Rhythmic Resultants — Book I: Ch. 2, 4, 5, 6, 12 . . . 113
C.2 Rhythmic Variations — Book I: Ch. 9, 10, 11 . . . 113
C.3 Rhythmic Grouping and Synchronisation — Book I: Ch. 3, 8 . . . 114
C.4 Rhythmic Generators . . . 114
C.5 Scale Generation — Book II: Ch. 2, 5, 7, 8 . . . 114
C.6 Scale Conversions — Book II: Ch. 5, 9 . . . 114
C.7 Harmony from Pitch Scales — Book II: Ch. 5, 9 . . . 114
C.8 Geometric Variations — Book III: Ch. 1, 2 . . . 115
C.9 Melodic Functions — Book IV: Ch. 3, 4, 5, 6, 7 . . . 115
Chapter 1
Background
1.1
Introduction
Almost since the inception of the discipline of computing, people have been using computers to compose and generate music. This is perhaps unsurprising given the importance of algorithmic principles in much compositional thinking throughout mu-sical history. The use of computers for music has mostly been driven by the desires of composers to generate interesting and unique new material.
Recognising the distinction between the composition of musical scores and other forms of music and sound generation, [Anders and Miranda 2011] have proposed the use of the term ‘computer-aided composition’ to refer to one area of what is more broadly known as ‘computer music’, a discipline which also encompasses the arts of sound synthesis and signal processing [Roads 1996]. This thesis is concerned with computer-aided composition: in particular, the computer-realisation of the musical formalism of Joseph Schillinger [Schillinger 1978]. Some authors prefer the term ‘al-gorithmic composition’ to refer to computer-aided composition [Nierhaus 2009]. In this thesis the two terms will be used interchangeably.
Joseph Schillinger was a Ukrainian-born composer, teacher and music theorist who was active in New York from the 1920s until his death in 1943. Schillinger’s lasting influence as a theorist and teacher exerted itself through famous students such as George Gershwin, Benny Goodman and Glenn Miller; and several distinguished television and radio composers [Quist 2002]. The distillation of his life’s work is con-tained in three large volumes. Two of these constitute The Schillinger System of Musical Composition [Schillinger 1978]. The third volume, The Mathematical Basis of the Arts [Schillinger 1976] was intended to be broader in scope and generalise much of his prior work in music to visual art and design. The Schillinger System attempted to differentiate itself from other accepted musical treatises by pursuing a more ‘scien-tific’ approach to composition. It consequently eschewed restrictive systems of rules created from the empirical analysis of Classical styles, as well as the notion of compo-sition by ‘intuition’. Instead it promoted a range of quasi-mathematical methods for the construction of musical material. The system was intended to be of practical use by working composers — George Gershwin famously wrote the opera Porgy and Bess while studying under Schillinger [Duke 1947].
Schillinger’s work has frequently been mentioned in passing by researchers work-ing in the field of computer-aided composition, but rarely addressed in any detail. There are several examples of similar individual algorithms that have been incor-porated into computer-aided composition systems, but most of these systems focus on specific computational paradigms which are unrelated to the rest of Schillinger’s work. To the best of the author’s knowledge, only one other system dedicated specifi-cally to the automation of Schillinger’s procedures exists in the form of publicly avail-able software, and no such system has been referred to in the academic literature. This thesis will therefore provide the first formal presentation and evaluation of an ‘auto-mated Schillinger System’. From here onwards, this term will be used to refer to the computer implementation being presented, while the term ‘Schillinger System’ will be used as a short form of The Schillinger System of Musical Composition.
1.2
Introduction to the Schillinger System
The two volumes of the Schillinger System [Schillinger 1978] consist of twelve books
presented as individual ‘theories’. Each of these theories is an exposition of
Schillinger’s musical philosophy combined with his technical discussions pertaining to general principles and explicit procedures. They include numerous examples of the procedures being carried out by hand, and lengthy annotations by the editors who published the work after Schillinger’s death.
The collection of theories is listed below. The work in its entirety is a formidable 1640 pages. Consequently, the scope of this thesis has only allowed for the first four theories to be considered in detail.
I Theory of Rhythm II Theory of Pitch-scales
III Variations of Music by Means of Geometrical Projection IV Theory of Melody
V Special Theory of Harmony
VI Correlation of Harmony and Melody
VII Theory of Counterpoint
VIII Instrumental Forms
IX General Theory of Harmony
X Evolution of Pitch Families
XI Theory of Composition
§1.2 Introduction to the Schillinger System 3
An existing software program known as StrataSynch by David McClanahan1is the only other known automated system to make explicit use of Schillinger’s theories. It implements the generation of four-part diatonic harmony using books V and VIII, and a single chapter from book I. The system described in this thesis extends beyond the scope of that system to a more versatile form of harmony generation utilising books I, II and III; and to the generation of single-voice melodic compositions utilising books I–IV.
1.2.1 Schillinger in Computer-aided Composition Literature
In an extended commentary on computer music from 1956–1986, Ames acknowl-edged the algorithmic nature of Schillinger’s work without pointing the reader to any known computer implementation, and noted that it had become ‘all but forgot-ten’ [Ames 1987]. Schillinger’s work was discussed in greater detail by Degazio, who again pointed out how much of it was presumably amenable to computer implemtation, and highlighted how particular properties of the Theory of Rhythm would en-able self-similar musical structures to be generated, thus relating it to the exploration of fractals in computer music [Degazio 1988]. The ability of the system to generate fractal structures was also identified by Miranda [Miranda 2001]. Miranda further noted the interesting rhythmic possibilities of using algebraic expansions and sym-metrical patterns of interference, both of which are also explored in the Theory of Rhythm. More recently Nierhaus gave a cursory mention of Schillinger in the epilogue of a survey of algorithmic composition, implicitly acknowledging that it is possible to be adapted but also failing to cite any example of an implementation [Nierhaus 2009]. Although the discussion of a specific implementation is lacking, algorithms simi-lar to those in Schillinger’s Theory of Melody were used with apparent success in early work by Myhill (cited in [Ames 1987]) and later by Miranda as part of a musical data structure used by agents in a swarm algorithm [Miranda 2003]. Furthermore, there are numerous examples of algorithms which use permutation in a similar manner to Schillinger’s Theory of Rhythm, and plenty of examples of systems which use in-version and retrograde techniques in a manner similar to Schillinger’s ‘geometrical projections’. There is no suggestion being made that these particular techniques orig-inate from Schillinger’s system alone; indeed their use can be found throughout the history of Western musical composition [Nierhaus 2009].
1.2.2 Motivation
If many of the procedures expounded by Schillinger are not unique (this is not to suggest that none of them are), then the value of his treatise is that it collates them to-gether, with each one presented in the context of the others and potentially useful in-terrelationships drawn. One of the motivations for adapting the Schillinger system is therefore the fact that it incorporates many algorithmic techniques which are demon-strably useful in computer-aided composition on their own, but have not been
tensively tested together in the absence of other prevailing computational paradigms. Another motivation is the fact that other oft-cited treatises on music theory in algorith-mic composition contain rules which are derived from existing music, such as Piston’s Harmony [Piston 1987]. Conversely, Schillinger’s work purports to have taken a more universal approach that does not draw its rules from the analysis of any particular musical corpus. For this reason it is ostensibly likely to be able to produce composi-tions which do not fall into the category of ‘style imitation’, which Nierhaus identified as being overwhelmingly dominant in the field [Nierhaus 2009]. Instead, it should al-low for a measure of stylistic diversity. As will be discussed in chapters 3 and 4 of this thesis, these notions are contentious and worthy of investigation.
1.2.3 Criticism
The very premise of Schillinger’s work is controversial by virtue of the fact that it effectively condemns previous theories and methodologies as inadequate [Backus 1960]. As a result it has attracted rigorous scrutiny by various authors. A 1946 review by Barbour [Barbour 1946] examined each of the ‘achievements’ of the Schillinger Sys-tem listed in a preface by the editors, and concluded that none of them were substan-tiated. Barbour also listed a number of errors and inconsistencies which highlighted the work’s fundamental lack of a sound scientific or mathematical basis.
Schillinger’s work was derided extensively by Backus [Backus 1960]. Dubbing it both ‘pseudo-science’ and ‘pseudo-mathematics’, he surveyed the first four volumes in some detail, pointing out that many descriptions of procedures are unnecessar-ily verbose and laced with undefined jargon; that the musical significance of them is based on numerology rather than any appropriately cited research; that much of the symbolic notation serves to obfuscate rather than clarify the expression of some-times trivial mathematical ideas; and finally that several mathematical definitions are simply incorrect. Backus thus raised many important issues concerning the formal interpretation of Schillinger’s techniques which are tackled in chapter 3 of this thesis. Neither Backus nor Barbour commented on whether Schillinger’s procedures were of any use by contemporary composers for generating musical material. In light of their resounding criticism, it is significant that other authors have considered many of the theories to be demonstrably useful in practice, or cited testimony from successful composers suggesting as much [Degazio 1988]. The composer Jeremy Arden pub-lished a PhD thesis documenting the study and utilisation of the Schillinger System from a compositional perspective [Arden 1996], concluding that the Theory of Rhythm and Theory of Pitch Scales offered many useful techniques. Although he swiftly dis-missed the Theory of Melody as ‘too cumbersome’ to be of practical use, similar prin-ciples to those contained in that theory have been found useful in other contexts as mentioned above in section 1.2.1. There is therefore no absolute consensus which would wholly discourage computer implementations of the Schillinger System.
§1.3 Summary of this Thesis 5
1.3
Summary of this Thesis
In this thesis, the automated Schillinger System designed by the author will be pre-sented and evaluated. To begin with, chapter 2 will survey both the dominant paradigms and the specific computational approaches in the field of computer-aided composition. This theoretical basis will serve to position the automated Schillinger System within the academic literature.
The details of the software implementation of the four initial books of the Schillinger System listed in section 1.2 will be presented in chapter 3. Alongside the requisite technical discussion, chapter 3 will provide a comprehensive outline of the bulk of the procedures contained in these books. Perhaps more importantly, it will also identify the inherent difficulties in translating a formalism designed for com-posers into a model able to be represented computationally, including the resolution of Schillinger’s notational and practical inconsistencies and the necessity for a raft of new procedures to sensibly link the theories together.
The evaluation of musical output is a perennial problem in this inter-disciplinary field, and few authors tend to venture beyond subjective conclusions drawing on their own musical backgrounds. However, one method of more rigorous evaluation con-sists of the enlisting of a ‘team of experts’ to supply qualitative data for analysis. Such an approach has been used to study the output of the system presented here. Addi-tionally, the burgeoning field of automatic genre classification has been engaged as a means of quantitatively assessing the statistical characteristics of the output. Together these forms of analysis aim to establish both the intrinsic musical merit and stylistic diversity of the automated Schillinger System. These experiments and their results will be presented in chapter 4.
The recently released four-part harmony system by McClanahan and the active
pursuit of new forms of representation for Schillinger’s ideas, embodied by the on-line Schillinger CHI Project2, suggest a resurgence of interest in automating parts of the Schillinger System. The software presented in this thesis aims to contribute to this mo-mentum, and is amenable to development beyond its current state as a ‘push-button’ music generator into a modular interface that could be used by composers and mul-timedia content creators. Many potential avenues for future research are explored in chapter 5.
Chapter 2
Overview of Computer-aided
Composition
This chapter will give a broad overview of the field of computer-aided composition, in order to place the automated Schillinger System in context, and to position this thesis as an addition to the computer music literature.
As remarked upon by Supper [Supper 2001], the distinctions between compo-sitional ideas, realisation in the musical score, and auditory perception are clearly bounded in a computing context. As this thesis is focusing on computer-aided com-position rather than attempting to encompass the entire field of computer music, this overview does not include algorithms which take music generation beyond the level of symbolic representation into digital audio. Instead, it is presumed that the symbolic data generated by composition algorithms can be further mapped to musical notation, MIDI data1or audio data depending on the application.
[Supper 2001] made a further taxonomic observation which is relevant to this chapter. He distinguished between:
1. the modelling of musically-oriented algorithmic procedures to produce encod-ings of established music theories;
2. procedures individual to a ‘composer-programmer’ where the code produces a unique class of pieces based upon the composer’s individual expertise; and 3. experiments with algorithms from extra-musical fields such as dynamic systems
or machine learning.
In fact, there are many instances where individual implementations bear relevance to two or three of Supper’s categories, and his is only one of a number of possible tax-onomies for describing computer-aided composition — section 2.1 lists a variety of other significant distinctions within the algorithmic composition literature. However, it is safe to observe that much recent academic research in computer-aided composi-tion is based primarily on the applicacomposi-tion of pre-existing extra-musical algorithms to music, thus falling into Supper’s third category. Section 2.2 describes this literature.
1MIDI stands for Musical Instrument Digital Interface. It is the dominant protocol for handling sym-bolic musical information in computer systems and hardware synthesizers.
Figure 2.1 provides a visualisation of the array of computational approaches used in the field, as discussed in section 2.2. These are connected by dashed lines which rep-resent their algorithmic or mathematical similarity, and roughly partitioned in terms of their use within the various paradigms discussed in section 2.1.
Fuzzy Logic Case-based Reasoning Generative Grammars FSA Markov Chains L-systems Genetic Algorithms Cellular Automata Chaos Musical domain knowledge Da ta -dr iven Fractals Non-musical Data Streams No t da ta -dr iven IGAs Genetic Programming Constraint Programming ATNs Automated Schillinger System Artificial Neural Nets Swarm Algorithms Musical "Expert Systems" S om et im es da ta -dr iven
§2.1 Dominant Paradigms in Computer-aided Composition 9
As this chapter will be limited to the discussion of systems designed with the ul-timate goal of composing music, other research areas such as computer auralisation, computational creativity and automated musicological analysis, despite being closely related to the success of particular algorithmic composition approaches, will not be explored per se. Discussions of computer style recognition, expressive musical per-formance and output evaluation are relevant to the experiments presented in chapter 4 and will be included there in the appropriate places.
2.1
Dominant Paradigms in Computer-aided Composition
Before commencing a description of the common algorithm families used in this field, it will be useful to outline several overarching (and often competing) paradigms. These are partly representative of differing philosophical approaches to automatic music generation, and partly to do with historical shifts in emphasis on computational approaches, which are in turn the result of past developments in artificial intelligence and the modelling of natural phenomena.
2.1.1 Style Imitation versus Genuine Composition
The reproduction of specific musical styles (‘style imitation’) constitutes the major-ity of algorithmic composition literature. Its dominance was testified to by Nierhaus in the epilogue of his comprehensive survey of algorithmic composition [Nierhaus 2009]. The styles in question are either those of particular individual composers, or those exemplified by the music of a particular culture or historical period. Style imi-tation is not limited to any particular group of computer algorithms, but is frequently the paradigm used by most of the the approaches in figure 2.1 that encode musical domain knowledge.
The reason for the dominance of style imitation is somewhat evident when one considers the large quantity of work dedicated specifically to four-voice chorale har-monisation [Pachet and Roy 2001]. This form of composition is perhaps the most thoroughly studied in the musicological literature due to the enormous quantity of ‘exemplar’ works courtesy of European Baroque and Classical composers. Conse-quently, a well-established set of rules of varying levels of strictness has been empiri-cally derived from this corpus over the course of several centuries, and this theoretical framework lends itself to being expressed as an optimisation problem in the context of ‘correct’ four-part harmony writing. Since optimisation problems sit comfortably within the realm of computer science, this style of composition is the most readily approachable by computer scientists. It has been pointed out by Allan that chorale harmonisation is “the closest thing we have to a precisely defined problem” [Allan 2002]. Any music generated within formal, recognisable stylistic boundaries is able to be evaluated either objectively or with a degree of authority by human listeners.
Conversely, the concept of ‘genuine composition’ [Nierhaus 2009] is problematic in computer music for the reason that genuinely new and different results are virtu-ally impossible to validate using quantitative methods, and very much at the mercy
of individual musical taste when it comes to human scrutiny. Nevertheless, while aca-demic work in this area is traditionally less common it is still pursued in earnest, espe-cially by researchers utilising chaos theory or algorithms with emergent behaviours.
2.1.2 Push-button versus Interactive
An algorithmic composition system which delivers a self-contained musical fragment, complete composition or an endless stream of musical material with real-time play-back requiring no human intervention after the setting of initial parameters may be referred to as a ‘push-button’ or ‘black-box’ system. Examples of well-documented push-button systems range from Hiller and Isaacson’s early experiments forming the Illiac Suite [Hiller and Isaacson 1959] to Cope’s Experiments in Musical Intelligence [Cope 2005]. Most four-part harmonisation systems also fall into this category.
Systems which generate music using continual human feedback are perhaps more frequently cited as being successful. This paradigm has been referred to in terms of a human-computer ‘feedback loop’ [Harley 1995] and features in a variety of composi-tion algorithms which are designed to either incorporate real-time human behaviour into their generative process or perform a gradual optimisation tailored to a user’s musical preference. Examples include interactive genetic algorithms using ‘human fitness functions’ [Biles and Eign 1995]; systems which allow a user to generate raw material and then modify a set of parameters to develop it further [Zicarelli 1987]; systems which allow the user to influence the generation of material from a more abstracted perspective [Beyls 1990]; systems which learn iteratively by ‘listening’ to a user’s live performance [Thom 2000]; and systems which map a user’s physical move-ment [Gartland-Jones 2002] or brain-wave activity [Miranda 2001] to a subset of the algorithm’s parameter space in real-time. Many authors have argued that these ar-eas of research hold greater promise than push-button systems, based on the notion that the acts of composition (and improvisation) are fundamentally human activities dependent on human interaction.
There also exists a body of software which functions as a kind of ‘blank slate’ for composers. These programs are usually modular in the sense that individual pre-existing algorithms can be interfaced arbitrarily, and there is often the scope for ‘composer-programmers’ to extend their functionality. Examples range from the early MUSICOMP by Robert Baker [Hiller and Baker 1964] to the more advanced Max by David Zicarelli [Zicarelli 2002]. Such environments are interactive by their very defi-nition, however once the template for a composition is completed by the composer, in many cases they arguably function as push-button systems. More recently, the advent of ‘live coding’ has been made possible by environments like Impromptu [Sorensen and Gardner 2010]. These environments are specifically designed to facilitate the cod-ing of musical procedures durcod-ing performance or improvisation.
§2.1 Dominant Paradigms in Computer-aided Composition 11
2.1.3 Data-driven versus Knowledge-engineered
In computer-aided composition a ‘data-driven’ solution relies on a database of exist-ing musical works on which to perform pattern extraction, statistical machine learn-ing or case-based reasonlearn-ing to derive musical knowledge. By contrast, a ‘knowledge-engineered’ system requires the coding of musical knowledge in the form of proce-dures or the manual population of a knowledge base. In figure 2.1, these alternative paradigms have been used to categorise various computational approaches on the left of the diagram.
An expert system combines a knowledge base of facts or predicates, ‘if-then-else’ rules and heuristics, with some kind of inference engine to perform logical problem solving in a particular problem domain [Coats 1988; Connell and Powell 1990]. Such a system requires the acquisition of knowledge either automatically or through a hu-man ‘domain expert’ [Mingers 1986]. The front end may be interactive (the user inputs queries or data) or non-interactive (fully automated). There is generally also the pre-requisite that an expert system is capable of both objectively judging its output using the same knowledge base, and tracing the decision path that led to the output for the user to analyse [Coats 1988].
The inherent flaws of expert systems are well-known. One problem is that as a sys-tem’s parameter space becomes more ambitious, the knowledge base of rules tends to expand exponentially. In algorithmic composition this has lead to optimisation prob-lems in four-part harmonisation which become computationally intractable above a certain polyphonic density or beyond a certain length, as found by Ebcio ˇglu [Ebcio ˇglu 1988]. Beyls also cited the ‘complexity barrier’ inherent in musical expert systems, and further noted the lack of graceful degradation in situations with incomplete or absent knowledge [Beyls 1991]. Phon-Amnuaisuk mentioned the common problem of ar-bitrating between contradictory voice-leading rules [Phon-Amnuaisuk 2004]. One of Mingers’ main criticisms of expert systems in general was that a rule base must always be incomplete when built from only a sample of all possible data [Mingers 1986].
In knowledge-engineered musical expert systems, the most significant obstacle is the time-consuming encoding of a sufficient quantity of expert knowledge to allow the system to compose anything non-trivial. For style imitation, a further problem is that many rules inherent to a particular style may not be obvious even to experts, or may not be possible to adequately express in the required format. Sabater et al. ar-ticulated an underlying issue of rule-based style imitation: “the rules dont make the music, it is the music which makes the rules” [Sabater et al. 1998]. For these reasons, the data-driven approach has become favoured by many researchers. Some of these authors have advocated for alternative ‘connectionist’ approaches to uncover the im-plicit knowledge of a musical corpus rather than attempt to find exim-plicit rules — their solutions typically perform supervised learning of the corpus using artificial neural networks.
2.1.4 Musical Domain Knowledge versus Emergent Behaviour
In figure 2.1 the two paradigms of musical domain knowledge and emergent be-haviour have been split vertically. The application of musical domain knowledge in computer-aided composition generally leads to a set of either implicit or explicit mu-sical rules being enforced, something practically unavoidable except in cases where completely random behaviour is sought for aesthetic reasons. The approach is often, but not always, aligned with style imitation. Such examples found in the literature are usually broadly referred as ‘musical expert systems’, but not all such approaches necessarily fall into this category if the accepted meaning of the term ‘expert system’ in computer science literature is enforced [Mingers 1986].
Miranda has suggested that rule-based composition systems lack ‘expression’ due to their inability to break rules, citing a famous quote by Frederico Richter: “In music, rules are made to be broken. Good composers are those who manage to break them well” [Miranda 2001]. This perceived fundamental flaw with the knowledge-based approach has provided inspiration for many researchers to look instead to paradigms which focus on dynamic or emergent behaviour, such as chaos, cellular automata and agent interaction in virtual swarms. Evolutionary algorithms have also been ex-plored extensively, because although they are usually designed to operate in a musical knowledge domain, they do so in a fundamentally stochastic manner rather than by applying generative rules [Biles 2007].
The dichotomy between knowledge-based music and ‘emergent’ music was iden-tified by Blackwell and Bentley, who separated the algorithmic composition field into ’A-type’ and ’I-type’ systems [Blackwell and Bentley 2002]. These labels respectively refer to systems that rely on encoded musical knowledge, and those that map the data streams from swarms, dynamic systems, chaotic attractors, natural phenom-ena or human activity to musical output. Beyls posited an equivalent delineation of ‘symbolic’ versus ‘symbolic’ algorithms [Beyls 1991]. The emergent or sub-symbolic paradigm seeks to “interpret rather than generate” [Blackwell and Bentley 2002], and is therefore usually associated with Nierhaus’s notion of genuine compo-sition [Nierhaus 2009]. However, a caveat which authors choosing this path have en-countered was pointed out by Miranda: the biggest difficulty when using non-musical processes for algorithmic composition is deciding how to translate the data stream into a representation which is musically meaningful [Miranda 2001].
2.2
Formal Computational Approaches
This section will explain the specific algorithmic approaches that have been applied to computer-aided composition. It will be seen that many of these approaches have strong mathematical similarities (as shown in figure 2.1), and may produce statisti-cally equivalent results depending on how they are implemented. As such, the organ-isation of this section does not strictly separate the algorithms based purely on their mathematical or purported musical properties. It does however indicate the range of distinct approaches to be found in the algorithmic composition literature.
§2.2 Formal Computational Approaches 13
The topics covered are grouped roughly into those that compose music using a statistical or probabilistic model of a style or corpus (Markov models and artificial neural networks); those which are most frequently associated with the ‘expert system’ paradigm in terms of being driven by systems of generative rules and constraints (formal grammars, finite state automata, case-based reasoning and fuzzy logic); and those which map the data from an extra-musical process onto a musical parameter space (chaos, fractals, cellular automata and swarm algorithms). For the most part the first two categories may be thought of as encoding ‘implicit’ and ‘explicit’ musical knowledge respectively. Evolutionary algorithms do not fall neatly into this particular taxonomy because although they encode musical knowledge, they navigate the space of musical possibilities stochastically.
2.2.1 Markov Models
Markov models were the earliest established extra-musical approach to computer-aided composition to be widely adopted. In a survey of the first three decades of algorithmic composition, Ames cited several examples of their use from the 1950s on-wards by composers such as Lejarin Hiller and Iannis Xenakis [Ames 1987]. Cohen described a number of early applications of the probabilistic replication of musical styles, treating what are essentially Markov chains as a musical application of Infor-mation Theory. Cohen’s notion of composition being regarded as simply “selecting acceptable sequences from a random source” is a potential motivation for using the technique for style imitation, suggesting that “the degree of selectivity of the works of composers is . . . a parameter of their style” [Cohen 1962]. Their relative ease of imple-mentation has perhaps also contributed to their popularity in computer music [Ames 1989].
A simple Markov model consists of a collection of states and a collection of tran-sition probabilities for moving between states in discrete time steps [Ames 1989]. The probabilities of states leading to one another may be represented by a ‘transition ma-trix’. The state space is discrete, and in musical applications, finite. A Markov chain is obtained by selecting an initial state and then generating a sequence of states using the transition matrix.
How this model is utilised in algorithmic composition differs between implemen-tations. States can be used, for example, to represent individual pitches, chords or durations; or they may be used to represent individual Markov chains of length n, which is equivalent to enforcing a dependency on events n time steps into the past. A Markov model in which all transitions depend on the previous n transitions is an
nth-order Markov model; these are commonly used to instil a measure of
context-sensitivity and thus encode musical objects at the phrase or cadence level. States may also represent entire vectors of potentially interdependent musical parameters, some-thing utilised by Xenakis in the form of ‘screens’ [Xenakis 1992].
The transition matrix may be either constructed by hand, or derived empirically by performing an automated analysis on a database of existing musical works. The latter amounts to encoding each work as a sequence of states, and determining the transition
probabilities by the relative tallies of each transition (analogous to the experiments carried out by A. A. Markov himself using Russian texts [Ames 1989]). These options correspond with Cohen’s labels of ‘synthetic’ and ‘analytic-synthetic’ [Cohen 1962]. Both approaches are present in the literature, and the choice has depended principally on whether the user is attempting to generate a particular aesthetic for an individual composition [Ames 1989] or performing style imitation, where the purpose is for the randomly generated output to inherit the generalised musical rules implicit in the corpus [Cohen 1962].
Examples of the use of Markov chains for algorithmic composition are numerous. Ames documented his use of the technique to develop works for monophonic solo instruments [Ames 1989]. In his program, the transition matrix is hand-crafted, and the entries define the probabilities of melodic intervals, note durations, articulations and registers. Hiller and Isaacson’s Experiment 4 from the Illiac Suite operated in much the same manner [Hiller and Isaacson 1959]. Cambouopoulos applied Markov chains to the construction of 16thcentury motet melodies in the style of the composer Palest-rina [Cambouropoulos 1994]. His approach also used hand-crafted transition matri-ces for melodic intervals and note durations; these were developed through manual statistical analysis of Palestrina’s melodies. Other authors have used a data-driven approach: Biyikoglu ‘trained’ a Markov model using the statistical analysis of a cor-pus of Bach’s chorales to generate four-part harmonisations [Biyikoglu 2003], while Allan solved the same chorale harmonisation problem using Hidden Markov Mod-els [Allan 2002]. Allan’s solution uses one Hidden Markov Model to generate chord ‘skeletons’ (the notes of the melody are treated as observations ‘emitted’ by hidden harmonic states), and two more to fill in the chords and provide ornamentation. It then uses constraint satisfaction procedures to prevent invalid chorales, and cross-entropy measured against unseen examples from the chorale set as a quantitative val-idation method.
The reported success of Markov models is varied. Allan concluded that coherent harmonisation can indeed be achieved via statistical examination of a corpus [Allan 2002], while in Ames’ assessment this often leads to “a garbled sense of the original style” [Ames 1989]. Biyikoglu suggested that Markov chains are not appropriate for modelling hierarchical relationships, but are capable of providing smooth harmonic changes [Biyikoglu 2003]. Cambouopoulos highlighted the potential for higher order chains to simulate a measure of musical context [Cambouropoulos 1994], however Baffioni et al. observed that chains of too high an order simply end up reproducing entire sections of the original corpus, and instead proposed a hierarchical organisation of separate Markov chains accounting for form, phrase and chord levels [Baffioni et al. 1981]. As Ames suggested, the fundamental problem with many of these models is that they provide an aural realisation of the probability distributions within a data set but cannot discern the methods behind its construction, and therefore serve as little more than “partial descriptions of non-random behaviour” [Ames 1989].
§2.2 Formal Computational Approaches 15
2.2.2 Artificial Neural Networks
Artificial neural networks (ANNs) are often used to investigate the notion of musi-cal style, and have been successfully used to perform style and genre classification (see section 4.4.1). ANNs are well-suited to these tasks because they are particularly good at finding generalised statistical representations of their input data [Russell and Norvig 2003]. In algorithmic composition, they tend to be aimed squarely at style imitation for this reason. The original motivations for pursuing this ‘connectionist’ approach as an alternative to expert systems were summarised by Todd, who champi-oned ANNs as a way to gracefully handle complex hidden associations within a data set, as well as numerous ‘exceptions’ to the established musical rules which would normally inflate the knowledge-base of an expert system [Todd 1989]. H ¨ornel and Menzel commented on neural networks’ abilities to circumvent the problem of rule explosion inherent in building sophisticated expert systems for style imitation [H ¨ornel and Menzel 1998].
ANNs are loosely modelled on the architecture of the brain [Russell and Norvig 2003]. Networks are built of simple computational units known as ‘perceptrons’, which are analogous to the function of individual biological neurons. A perceptron calculates a weighted aggregate of its inputs, subtracts a ‘threshold’ value and ‘fires’ by passing the result through a differentiable activation function such as a sigmoid or hyper-tangent. The most common practical implementation of a neural network is known as a ‘multi-layer perceptron’ (MLP). This normally consists of a layer of ‘hid-den’ neurons connected to both a set of inputs representing the input dimensions of the training set, and a set of output neurons which represent the output dimensions. The basic function of a neural network is to learn associations between input vectors and target output vectors by adjusting randomly initialised weights along network connections. A popular method for doing this is ‘gradient descent back propagation’, in which the input vectors are fed forward through the network and the mean-squared error between the output and target vectors is gradually reduced (subject to a scalar ‘learning rate’) over some number of epochs using the derivative of the error func-tion. In this way the weights come to form a statistical generalisation of the training set through repeated exposure to input vectors. In musical applications, the outputs are normally fed back into the inputs to form a ‘recurrent neural network’ (RNN), and a technique such as back propagation through time (BPTT) can then be used to model temporal relationships in the corpus [Mozer 1994]. Neurons which feed back into themselves may also be used to implement short term neural ‘memory’. To com-pose new music using an RNN, a trained network is simply seeded with a new input vector and the outputs are recorded for some number of iterations.
Todd’s original system restricted the domain to monophonic melodies represented using the dimensions of pitch and duration [Todd 1989]. He combined two differ-ent network types — a three-layer RNN with individual neural feedback loops to model temporal melodic behaviour at the note level, and a standard MLP which, when trained, acted as a static mapping function from fixed input sequences to out-put sequences [Todd 1989]. Mozer implemented an RNN that learned and composed
single-voice melodies with accompaniment, called CONCERT [Mozer 1994]. It im-proved on Todd’s work in various ways, such as using a probabilistic interpretation of the network outputs, and more sophisticated data structures for musical repre-sentation. Mozer’s network inputs represented 49 pitches over four octaves. H ¨ornel and Menzel described a neural network system called HARMONET with the ability to harmonise chorale melodies, and a counterpart system MELONET for composing melodies [H ¨ornel and Menzel 1998]. Both of their approaches used a combination of ANNs for the ‘creative’ work and constraint-based evaluation for the ‘book-keeping’. ANNs have also been used as fitness evaluators in evolutionary algorithms as one way of alleviating both the inadequacy of objective musical fitness functions and the ‘fitness bottleneck’ caused by human intervention (see section 2.2.5). For instance, Spector and Alpern used a three-layer MLP trained on the repertoire of jazz saxo-phonist Charlie Parker which was used to classify members of a population as either ‘good’ or ‘bad’ [Spector and Alpern 1995].
The aesthetic products from ANNs are also reported as being mixed. Mozer’s results when attempting to compose in the style of Bach were reported to be ‘rea-sonable’, but his experiments on European folk-tunes were less successful [Mozer 1994]. H ¨ornel and Menzel’s compositions using HARMONET and MELONET, on the other hand, were evaluated as ‘very competent’, and showed that ANNs could be used to imitate characteristics strongly associated with a composer’s style [H ¨ornel and Menzel 1998]. Todd avoided a judgement of merit regarding his ANN-composed melodies, stating only that they were “more or less unpredictable and therefore mu-sically interesting” [Todd 1989]. A common criticism of most ANN approaches is that they essentially learn the statistical equivalent of a set of complex Markov transition matrices, and are therefore only slightly more capable than Markov chains of mod-elling higher order musical structure [Mozer 1994]. Phon-Amnuaisuk points out that they learn only ‘unstructured knowledge’ [Phon-Amnuaisuk 2004]. Eck and Schmid-huber have offered a potential remedy to this problem by using ‘long short term mem-ory’ (LSTM) to allow for some association of temporally distant events manifesting as medium-scale musical structure. Their method resulted in the ‘successful’ production of improvisations over fixed Bebop chord sequences [Eck and Schmidhuber 2002].
2.2.3 Generative Grammars and Finite State Automata
Algorithmic composition systems incorporating generative grammars are what are most commonly referred to as musical ‘expert systems’, because they presuppose an encoding of explicit domain-specific rules, irrespective of whether those rules are encoded by hand or extracted automatically from a corpus. The attraction of this method is that it is capable of encoding the established musical knowledge of musi-cological texts, and it also provides a way to generate coherent musical structure at multiple hierarchical levels, while at the same time allowing for a large space of com-plex sequences [Steedman 1984]. Many of the the generative grammar systems are informed by the work of Chomsky regarding linguistic syntax [Chomsky 1957], and later work by Lerdahl and Jackendoff [Lerdahl and Jackendoff 1983] which builds
§2.2 Formal Computational Approaches 17
upon the musicological analysis theories of Schenker [Schenker 1954]. The generative grammar approach bears strong similarities to the implementation of finite state au-tomata (FSA), and both grammars and FSA have been shown to function identically to Markov chains in certain circumstances [Roads and Wieneke 1979; Pachet and Roy 2001]. Material obtained by applying the production rules of a generative grammar is most often filtered using a knowledge-base of constraints which define the legal musical properties of the system [Anders and Miranda 2011].
A generative grammar can be described as consisting of an alphabet of non-terminal tokens N, an alphabet of non-terminal tokens T, an initial root token Σ and a
set of production or rewrite rules P of the form A → B, where A and B are token
strings [Roads and Wieneke 1979]. A grammar G is represented formally by the tuple G = (N, T, Σ, P), and music is generated by establishing a set of musical tokens such as pitches, rhythms or chord types, and designing a set of production rules that imple-ment legal musical progressions. Chomsky’s taxonomy of type 0, 1, 2 and 3 grammars (‘free’, ‘context-free’, ‘context-sensitive’ and ‘finite state’) [Chomsky 1957] is relevant to music production. For instance, Roads and Weineke observed that grammar types 0 and 3 are inadequate for achieving structural coherence [Roads and Wieneke 1979].
Rader utilised stochastic grammars in an early implementation of a Classical style imitator [Rader 1974]. The system he devised was a ‘round’ generator, wherein each incarnation of the melody is constrained to consonantly harmonise with itself at regu-lar temporal displacements. It used an extensive set of production rules with assigned probabilities, and a set of constraints. Domain knowledge was derived from tradi-tional harmonic theory, in this case Walter Piston’s treatise Harmony [Piston 1987]. Holtzman described a system in which the production rules of multiple grammar types were implemented along with ‘meta-production’ rules [Holtzman 1981], thus constituting the knowledge and meta-knowledge of an expert system [Mingers 1986]. These were accompanied by common transformational operations such as inversion, retrograde and transposition, and used to reproduce a work by the composer Arnold Schoenberg [Holtzman 1981]. Steedman modelled jazz 12-bar blues chord sequences with context-free grammars [Steedman 1984], using an approach informed directly by the musicological work of Lerdahl and Jackendoff [Lerdahl and Jackendoff 1983]. Ebcio ˇglu produced what was, according to Pachet and Roy [Pachet and Roy 2001], the first real solution to the four-part chorale harmonisation problem [Ebcio ˇglu 1988]. His system implemented an exhaustive optimisation process using multiple automata and sets of constraints based on traditional harmonic rules for generating chord skeletons, pitches and rhythms from an initial melody. Storino et al. used a manually encoded generative grammar to compose pieces in the style of the Italian composer Legrenzi [Storino et al. 2007]. Both Zimmerman [Zimmermann 2001] and Hedelin [Hedelin 2008] have used grammars to generate large compositional structures which are then filled with chord skeletons using Riemann chord notation [Mickselsen 1977], before finally being fleshed out with note-level information — the aim being to bring form and construction closer to one another instead of relying on a single set of production rules to generate ‘incidental’ musical structure [Hedelin 2008].
an augmented transition network (ATN), which is combined with a ‘reflexive pattern matcher’ to form a data-driven expert system [Cope 1992]. The analysis of a manually encoded and annotated corpus of works is performed using a method purportedly in-formed by the work of Schenker [da Silva 2003]. This method is referred to by Cope as SPEAC, which is an acronym for the possible chord classifications ‘statement’, ‘prepa-ration’, ‘extension’, ‘antecedent’ and ‘consequent’ depending on a chord’s makeup and context. A ‘signature dictionary’ of statistically significant recurring musical frag-ments of between 1 and 8 intervals is built using the pattern matcher [da Silva 2003]. To produce new works, the ATN implements a set of production rules designed to stochastically generate a new SPEAC sequence, and constraint systems are applied to determine the final pitch, duration and note velocity information. EMI has been used to compose thousands of works which closely mimic the styles of famous composers including Bach, Chopin, Beethoven, Bartok, and Cope himself. More recently, an ‘oeu-vre’ of around one-thousand selected works in a wide range of styles produced by the system has been established as a style database itself, which Cope has used to interac-tively feed back into an updated system based on the same ‘recombination’ principles known as Emily Howell [Cope 2005]. Cope associates the notion of a prolonged style imitation feedback loop with his proposed definition of creativity, arguing that such a process is difficult to formally distinguish from the human creative process [Cope 2005].
In general, systems incorporating some form of generative grammar imbued with explicit musical knowledge have been found to give more convincing musical results for style imitation than the statistically oriented approaches of Markov chains and ANNs. Pachet and Roy concluded that the chorale harmonisation problem had es-sentially been ‘solved’ by expert systems [Pachet and Roy 2001]. The compositions produced by Cope’s programs have achieved notoriety for their quality [da Silva 2003]. Storino et al. found that grammar-based systems were frequently capable of successfully fooling audiences of musicians into believing that computer-composed works were in fact human-composed [Storino et al. 2007]. However, many of these approaches still suffer from problems common to expert systems generally, including the encoding of large enough knowledge bases [Coats 1988] and the potential for in-tractability due to combinatorial explosion [Pachet and Roy 2001]. Steedman noted that simple grammars will always produce correct musical syntax, but have a natural propensity to generate music with no semantic: the encoding of musical meaning is an extremely difficult problem [Steedman 1984]. Miranda has claimed that the biggest weakness of these systems, in the context of composing genuinely new music, is their innate inability to break rules [Miranda 2001].
2.2.4 Case-based Reasoning and Fuzzy Logic
Case-based reasoning (CBR) and fuzzy logic also fall within the expert system paradigm because they implement architectures that couple a knowledge-base with an inference engine to generate musical sequences [Sabater et al. 1998]. CBR systems rely on a database of previous valid musical ‘cases’ from which to infer new
knowl-§2.2 Formal Computational Approaches 19
edge, and are therefore inherently data-driven, even though they may further incor-porate a set of immutable knowledge-engineered rules or constraints [Pereira et al. 1997]. A CBR system uses past experience to solve new problems by storing previous observations in a ‘case base’ and adapting them for use in new solutions when similar or identical problems are presented [Ribeiro et al. 2001].
Sabater et al. used case-based reasoning, supported by a set of musical rules, to generate melody harmonisation [Sabater et al. 1998]. The rules represent ‘general’ knowledge derived from traditional harmonic theory, while the cases in the database represent the ‘concrete’ knowledge of a musical corpus. Their system consists of a CBR engine with a case base, and a rule module which only suggests a solution when the CBR fails to find an example of a past solution for a particular scenario using a ‘na¨ıve’ search (in this case a note to be harmonised). Successful solutions to problems are added to the case base for future use. The system conforms to the traditional no-tion of an expert system which encodes domain knowledge, problem solving knowl-edge and meta-level knowlknowl-edge [Connell and Powell 1990].
Ribeiro et al. implemented an interactive program called MuzaCazUza which uses a CBR system to generate melodic compositions [Ribeiro et al. 2001]. The case base is populated with works by Bach. In this system, case retrieval is done by using a metric based on Schoenberg’s ‘chart of regions’ [Schoenberg 1969] and an indexing system to compare a present case with a stored case. The case with the closest match is considered. After each retrieval phase, a musical transformation such as repeti-tion, inversion, retrograde, transposirepeti-tion, or random mutation is applied by the user, and an ‘adaptation’ phase simply drags non-diatonic notes into their closest diatonic positions. The authors suggest continually feeding the results of a CBR system back into the case base, thus creating a model not unlike the one proposed by Cope [Cope 2005]. Pereira et al. used a similar system to Ribeiro et al., this time with a case base consisting of the works of the composer Seixas [Pereira et al. 1997]. Their CBR engine is modelled on cognitive aspects of creativity — ‘preparation’; that is, the loading of the problem and case base; ‘incubation’, which consists of CBR retrieval and rank-ing based on similarity metric; ‘illumination’, which is the adaptation of the retrieved case to the current composition; and ‘verification’, which in this case is the analysis by human experts. During the incubation stage, the standard ‘musically meaningful’ transformations of inversion, retrograde and transposition are employed to expand the system’s ability to generate new music.
According to Sabater et al. the combination of rule and case-based reasoning meth-ods is especially useful in situations where it is both difficult to find a large enough corpus, and inappropriate to work only with general rules [Sabater et al. 1998]. Pereira et al. believe that CBR systems contain a lot more scope for producing music that is different from the originals than musical grammars inferred from a corpus [Pereira et al. 1997].
At least one musical expert system based on fuzzy logic has been described in the literature. The system by Elsea [Elsea 1995] was implemented in Zicarelli’s Max environment [Zicarelli 2002]. The term ‘fuzzy logic’ is a potential misnomer, as the word ‘fuzzy’ refers not to the logic itself, but to the nature of the knowledge being
represented [Zadeh 1965]. The knowledge base in a fuzzy system distinguishes it-self by being made up of ‘linguistic’ rules with meanings that cannot be expressed by ‘crisp’ boolean logic. For instance, the fuzzy rule “If there have been too many firsts in a row, then root or second” [Elsea 1995] is a linguistic expression guiding the in-ference system to avoid prolonged sequences of first inversion chords. Calculations based on this rule are made possible by assigning fractional ‘membership values’ to the quantities of successive first inversion chords that could to some degree be con-sidered ‘too many’. The final decision of whether to transition to a root or second inversion chord is made using a translation from fuzzy membership values to corre-sponding fuzzy values in the decision space, which are then ‘defuzzified’ to a single value using an algorithm such as Mamdani or Sugeno [Hopgood 2011]. This process is deterministic and constitutes a precise mapping. Sophisticated fuzzy expert systems may suffer the same problems of knowledge-engineering, ‘rule explosion’ and com-putational complexity as crisp expert systems, but they are a lot more graceful when handling missing, inconsistent or incomplete knowledge [Zeng and Keane 2005] and are therefore potentially more effective at making musically meaningful inferences using small corpora.
2.2.5 Evolutionary Algorithms
The term ‘evolutionary algorithms’ refers to a collection of techniques inspired pri-marily by Darwinian natural selection [Husbands et al. 2007]. Two of these techniques which have been investigated in the field of algorithmic composition are genetic al-gorithms, and to a lesser extent genetic programming. These algorithms implement sophisticated heuristics for converging on local optimal solutions in very large search spaces. The reason for their popularity in algorithmic composition is their ability to traverse diverse regions of a space of musical solutions stochastically. This is advan-tageous for musical optimisation problems like four-part harmonisation, because it renders them no longer computationally intractable compared to expert system solu-tions like Ebcio ˇglu’s [Ebcio ˇglu 1988]. Furthermore, with a stochastic approach comes the apparent implication that new music unhindered by generative rules is possible [Gartland-Jones and Copley 2003]. Thus, while in non-artistic fields genetic algo-rithms and genetic programming are usually used to solve optimisation problems, in music they are also commonly exploited for their ‘exploration’ abilities, and are sometimes claimed to be analogous to elements of the human composition process [Gartland-Jones 2002].
Genetic algorithms (GA) are a heuristic search technique in which candidate
solu-tions are represented as a population of strings or ‘chromosomes’
[Burton and Vladimirova 1999]. Each ‘gene’ of the chromosome represents a dimen-sion of the solution space. A stochastic search process is controlled by a selection procedure based on individual ‘fitness’ and ‘reproductive’ operators to obtain suc-cessive generations of a population, and ‘mutation’ operators to randomly introduce new genetic material into an existing population. The search runs for a fixed num-ber of generations, or until the fittest individual is somehow deemed fit enough to
§2.2 Formal Computational Approaches 21
be the final solution. Reproductive operators typically implement ‘genetic crossover’ to merge a number of parents into an offspring, and mutation operators are used to modify individual genes or small sections of an offspring’s chromosome. In the sim-plest ‘traditional’ GA, individuals are represented by binary strings and genetic op-erators operate at the binary level, with crossover occurring at arbitrary points along the string and mutation operators causing random ‘bit flips’ [Engelbrecht 2007]. How-ever, for algorithmic composition most authors have found it necessary to instill the evolutionary process with a measure of musical domain knowledge to radically en-hance the process. In particular, chromosomes are used to represent musical infor-mation at a higher level of abstraction, and ‘musically meaningful’ mutation opera-tors are chosen, including the transformational procedures of inversion, reversal and transposition [Burton and Vladimirova 1999]. Fitness evaluation is usually cited as the most problematic aspect of GAs. Gartland-Jones and Copely classified genetic al-gorithms by their use of either ‘automatic’ (using an objective function or an ANN trained on a corpus) or ‘interactive’ (requiring human inspection/listening) fitness functions [Gartland-Jones and Copley 2003]. The latter are often referred to as inter-active genetic algorithms (IGAs) [Biles 2001].
Phon-Amnuaisuk et al. used a GA to create traditional four-part harmonies [Phon-Amnuaisuk et al. 1999]. They relied on an objective knowledge-based fitness function for the evaluation of chromosomes. The chromosomes encoded short thematic pas-sages, the mutation operators included ‘perturbation’, which nudges a note in a single voice up or down a semitone; ‘swapping’, where chords are altered by swapping two random voices; ‘re-chord’ which randomly modifies the chord type; ‘phrase-start’, which mutates a phrase to begin on a root chord; and ‘phrase-end’, which mutates a phrase to end on a root chord. The main reproductive procedure involved splicing the chromosome strings at a random crossover point. The fitness function was a cast of rules commonly listed in traditional voice-leading theories.
Biles presented a genetic algorithm called GenJam for generating monophonic jazz solos [Biles 1994]. GenJam initialises individuals within a population of melodic pas-sages. It performs musically meaningful mutations such as inversion, reversal, rota-tion and transposirota-tion. The fitness of each individual in a generarota-tion is determined by a human operator, and the best individuals are used as the parents of the follow-ing generation. Accordfollow-ing to Biles, this feedback process converges on solos which match the taste of the human operator [Biles 1994]. The main disadvantage of this method is that the reliance on human feedback for evaluating fitness manifests as a bottleneck which makes the convergence process orders of magnitude slower than using objective fitness functions. Biles has addressed this problem by using entire audiences instead of individual users [Biles and Eign 1995], using ANNs for fitness functions [Biles et al. 1996], and removing fitness evaluation altogether by drawing the initial population from an established database of superior specimens [Biles 2001]. Genetic programming (GP) is an extension to the GA paradigm in which the indi-viduals in the population are not vectors representing points in a solution space, but hierarchical expressions representing mathematical functions or the code for entire al-gorithms [Burton and Vladimirova 1999]. GP individuals are normally represented
as expression tree structures; consequently the selection, reproduction and mutation mechanisms are designed specifically to operate on these structures [Engelbrecht 2007]. GP fitness functions are more commonly realised as error or ‘cost’ functions because they are very popular for solving symbolic regression problems, but aside from these differences GP and GA implementations are fundamentally the same. Laine and Kuuskankare [Laine and Kuuskankare 1994], for instance, generated an initial pop-ulation of melodies using simple mathematical operators and trigonometric func-tions, then evolved the population by performing crossover and mutation on subtrees. Longer and more complex musical phrases result from the increasing complexity of the population generations. Puente et al. used a GP technique to evolve context-free grammars for producing melodies in the style of a corpus of works by several fa-mous composers [Puente et al. 2002]. In this instance the fitness function was simply a statistical comparison between the population members and the melodies from the corpus.
Burton and Vladimirova suggested that genetic techniques allow a greater scope of musical possibilities and often subjective ‘realism’ than other approaches such as ANNs, which are restricted by training data; expert systems, which are often re-stricted by computational complexity and knowledge-engineering issues; and purely stochastic generators which exhibit good unpredictability but ‘questionable musical-ity’ [Burton and Vladimirova 1999]. However, they and many other authors have ac-knowledged the perennial problem of designing effective fitness-evaluation methods that reduce the counter-productive dependence on human interaction — the ‘fitness bottleneck’ [Biles et al. 1996]. Additionally, many conundrums are ever-present in the tuning of genetic algorithm parameters, such as whether to implement ‘elitist’ se-lection policies that may converge too quickly to local optima, or policies that retain a high level of diversity and allow low-quality individuals to continue reproducing [Burton and Vladimirova 1999]. Phon-Amnuaisuk et al. discovered that despite the supposed advantages of using GAs for four-part harmonisation, a simple rule-based system was capable of achieving consistently better results as far as the GA’s fitness function was concerned [Phon-Amnuaisuk et al. 1999]. They attributed this to the GA’s lack of sufficient ‘meta-knowledge’, a natural trait for an expert system by virtue of the fact that the structure of the search process can be easily encoded in the pro-gram. They also noted the GA’s inability to guarantee globally optimal solutions (a caveat of stochastic search), and declared the GA model ill-suited to musical optimisa-tion problems. Despite all this, both interactive and non-interactive GAs continue to be used successfully for tasks like jazz improvisation [Biles 2007] and the composition of thematic bridging sections between user-supplied ‘source’ and ‘target’ passages [Gartland-Jones 2002].
2.2.6 Chaos and Fractals
Approaches to algorithmic composition in the tightly related fields of chaos and frac-tals have been popular as alternatives to the expert-system paradigm because of their tendency to exhibit recurrent patterns or multi-layered self-similarity, while at the
§2.2 Formal Computational Approaches 23
same time being fundamentally unpredictable or complex [Harley 1995]. Both are linked to mathematical resultants of the behaviour of iterated function systems (IFS) and dynamical systems, and were introduced as an alternative explanations for com-plex natural phenomena such as weather systems and the shape of coastlines [Man-delbrot 1983]. According to Harley [Harley 1995], their applicability to music has been influenced by the work of Lerdahl and Jackendoff, who provided convincing models for analysing musical self-similarity [Lerdahl and Jackendoff 1983]; and Voss and Clarke, who demonstrated that some music contains patterns which can be
de-scribed using 1/f noise [Voss and Clarke 1978]. The non-musical, numerical data
streams created by applying such algorithms are not usually termed ‘emergent be-haviour’ because they are not generated by the interaction of a virtual environment of simple interacting units. However, they share the property of being able to gen-erate complexity at the ‘macroscopic’ level from simplicity at the ‘microscopic level’ [Beyls 1991]. Furthermore, their successful conversion into musical information is at the mercy of the mapping problem noted by Miranda [Miranda 2001], a problem also faced by systems of emergent behaviour such as cellular automata and swarms.
Chaotic systems were explored by Bidlack as a means of using simple algorithms for endowing computer generated music with ‘natural’ qualities — for instance, those which can be found relating to either organic processes or divergent mathematical phenomena [Bidlack 1992]. Bidlack noted that the resultant complexity had more po-tential in computer synthesis, but suggested that the technique could be useful for perturbing musical structure at various levels of hierarchy, in order to instill a sys-tem with a measure of unpredictability. Dodge described a ‘musical fractal’ algorithm utilising 1/f noise, arguing along the lines of Voss and Clarke that 1/f noise rep-resents a close fit to many dynamic phenomena found in nature [Dodge 1988]. He drew the analogy between his recursively ‘time-filling’ process and Mandelbrot’s re-cursively ‘space-filling’ curves. The time-filling fractal form is seeded by an initial pitch sequence, which is then filled in by 1/f noise and mapped to musical pitch, rhythm and amplitude. Harley produced an interactive algorithm that centres on a ‘generator’ which provides the output of a recursive logistic differential equation; a ‘mapping’ module which scales the output to a range specified by the user; a third module which provides statistical data on the generator’s output over specified time-frames to provide knowledge of high-level structures to the user; and a fourth module which the user controls to reorder the generator output in the process of translating it to musical parameters [Harley 1995]. These modules can be networked together in order to act as raw input or as input ‘biases’ for one another.
There are several examples in the algorithmic composition literature of the use of Lindenmayer Systems (L-Systems) for generating fractal-like structures. L-Systems were originally introduced to model the cellular growth of plants [Lindenmayer 1968], and first explored for musical applications by Prusinkiewicz [Prusinkiewicz 1986]. L-Systems are deterministic and expressed almost identically to Chomsky’s gram-mars, with the crucial difference being that instead of production rules applying se-quentially, they are applied concurrently; this is what allows self-similar substructures to quickly propagate through what are exponentially expanding strings. The work by