Local Consonance Maximization in Realtime

(1)

A thesis submitted in partial satisfaction of the

requirements for the degree of

Master of Computer Science and Engineering

in the Graduate School of

the University of Aizu

Local Consonance Maximization in Realtime

by

Juli´an Villegas

(2)

The thesis titled

Local Consonance Maximization in Realtime

by

Juli´an Villegas

is reviewed and approved by:

Main referee

Professor Date

Michael Cohen

Associate Professor Date

Robert Fujii

Associate Professor Date

Satoshi Nishimura

The University of Aizu

(3)

(4)

List of Figures

2.1 Plomp and Levelt’s dissonance curve of a dyad . . . 8

2.2 Trombone spectrogram for a 440 Hz tone . . . 9

2.3 Dissonances curves of an alto-trombone partials . . . 10

2.4 Intrinsic dissonance curve for an alto-trombone . . . 11

3.1 Harmonic series forC2 . . . 13

3.2 A simple harmonic progression . . . 16

4.1 An intrinsic dissonance curve for a non-harmonic tone . . . 19

4.2 springs model of deLaubenfels . . . 20

4.3 ERBfor100 ≤ f ≤ 10000 Hz . . . 24

4.4 Auto-tune 4 . . . 25

5.1 Benson’s vs. Sethares’ model . . . 28

5.2 Comparation of two dissonance models . . . 29

5.3 Illustration of dissonance calculation . . . 30

5.4 Consonance model used . . . 31

5.5 The solution space for a P5 . . . 34

5.6 The solution space for a brass duet . . . 34

6.1 ImplementedGUI . . . 37

6.2 Implementation signal-flow diagram . . . 38

6.3 SSB diagram . . . 40

6.4 goldenearcaption . . . 40

6.5 goldenear’s flowchart . . . 41

7.1 Convergence test patch . . . 46

(7)

List of Tables

2.1 Gradus suavitatis for some intervals . . . . 7

2.2 Spectrum values for an alto-trombone . . . 10

3.1 Harmonics of a 110 Hz tone . . . 13

3.2 Intervals in several tuning systems . . . 15

4.1 Virtual (impossible?) Trombone . . . 19

5.1 Harmonics andERB . . . 31

5.2 Spectrum values for Figure 2.4 . . . 35

7.1 Convergence results . . . 47

(8)

Preface

Is it possible to adjust the fundamental frequencies of concurrent sounds in realtime so the resulting interaction between them be more consonant without losing its original character?

Using the currently more accepted theories about consonance, this could be pos-sible by extracting from every sound the required information to compute their disso-nance and look within the vicinity of each tone for candidate frequencies that yield better results consonance-wise. All this process has to be fast enough to be consid-ered in realtime. This thesis proposes a new mechanism to resolve this problem and presents a promising prototype able to accomplish such a difficult task.

Previous solutions use the MIDIprotocol for the implementation inheriting its

lim-itations, and timbres are known in advance by the software which is unable to detect timbre changes during a performance. These problems are improved in the proposed model.

A general introduction to the thesis topic is presented in chapter 1. A brief introduc-tion to different interpretaintroduc-tions of consonance and dissonance is presented in chapter 2, remarking the most important of these theories for the purposes of the present thesis, this is followed by a description of the relationship between musical scales, musical tunings, harmony and the psychoacoustical phenomenon of consonance in chapter 3. Chapter 4 summarizes the state of the art as well as historical answers to the research question. The proposed model as well as the original contributions in the research area are discussed in detail in chapter 5, the implementation details and results are included in chapters 6 and 7 respectively. Some enhancements and refinements and alternative implementations to the presented model are proposed as future work in chapter 8 and chapter 9 summarizes the main conclusions.

This thesis is mainly inspired by the work of Sethares, and the points in common as well as the differences between both models are remarked along the text but especially in chapter 4.

In order to follow easily the discourse of this document, the reader should be fa-miliarized with musical terminology, especially musical scales, tunings, the concept of intervals and harmony. A recapitulation of these topics is beyond the scope of the present document, but good and sufficient references can be found in Internet. Besides those, I recommend as very elemental introductions “Lies My Music Teacher Told Me: Music Theory for Grownups” by Gerald Eskelin [1], and “Edly’s Music Theory for Practical People” by Ed Roseman [2]. Knowledge in basic DSP techniques is a plus.

Considering that there is always a compromise between consonance and mobility among different keys and that solutions proposed since the problem was first addressed, several centuries ago, remain inadequate for many situations, the search for alternatives as the one proposed in this thesis are always important and contribute to refine the existing mechanisms or to discover new ones.

(9)

Acknowledgment

I’d like to thank in first instance to Prof. Michael Cohen for his encouragement, friend-ship, numerous curricular and extracurricular advises, example, generosity, and sup-port, among many other things, during all the stages of my master program. For him and his family, my eternal gratitude.

My academic experience in Japan, and my life here, wouldn’t be as pleasant as it has been, without the generous help of Juan Gonzalez (Izumi Sensei) and its family. His always open hand and his always opportune assistance (‘Diosidencias’ as he call them) are rare conditions to find simultaneously in a person, and I always enjoyed of them. Thank you from the bottom of my heart.

I want to thank to Uresh Duminduwardena, Ashuboda Marasinghe, and Newton Fernando for their care and support specially at the beginning of this journey. I’m grateful for having learned from them the importance of the words, and the dialog. For them and their families my respect, and my thankfulness. I hope we can meet again and pay them back all the received favors.

To Jerold de Hart my gratitude for many engaging conversations over innumerable tasteful cups of coffee. For always having his door open, a nice word, and for al-ways be concerned about the welfare of the international students like me, my sincere appreciation.

I was lucky to find a good reception to my questions from Prof. William Sethares, Prof. David Huron, Prof. Dave Benson, and Pd mailing lists. For their stimulant, illustrative and encouraging correspondence, my gratitude.

My family: Jorge, Norma, Leonardo, and Claudia, I’m what I am because of you, there’s no way to compensate you for being there when I needed you the most.

Isadora Garcia, for sharing with me this dream, for understanding and leave me fly, for loving me, and for waiting for me, I cannot be more grateful, and I feel very lucky of having met you. Thank you for give me the strength to keep going.

Margarita Yepes, Andrea Rosales, La Torcida, for tacitly accompany all the time. I want to thank specially to the Japan Student Services Organization (JASSO) for their economical support during the last semester of the master program through the “Gakusyu-syoreihi” Scholarship.

Also I want to thank to AWIA, Saisua, rat-at-tat, and Vital Connections, for their assistance. All members of our laboratory, specially to the Japanese crew for their pa-tience and comprehension, translating documents to their language.

(10)

(11)

Abstract

Although the problem of maximizing consonance in tonal music has been addressed before, every solution reflecting the technological advances of its epoch, and consid-ering that current theories to explain this psychoacoustical phenomenon are generally satisfactory, there are still vast unexplored aspects of this area, since even most recent solutions lack adequate mechanisms to apply such techniques in realtime scenarios. In general, the most advanced achievements in this field are based on theMIDI protocol for controlling the pitch of simultaneous notes, inheriting the protocol limitations in terms of dependency on the quality of the synthesizer for satisfactory results, scalabil-ity, accuracy, veracscalabil-ity, etc. Besides that, timbres are generally known a priori for these techniques, so their application to unknown timbres requires digitization and analysis of sound samples, making such techniques unsuitable for realtime situations.

This thesis summarizes the main theories about consonance and its relation to mu-sical scales, reviews several previous solutions as well as the state of the art, proposes an alternative model to adaptively adjust consonance in a polyphonic scenario based on the tonotopic dissonance paradigm (presented by Plomp and Levelt, having been previously developed by Sethares), and presents a prototype of this model that aims to surmount the difficulties of prior solutions by performing realtime analysis and pitch adjustment programmed in Pure-data (Pd), a data flow DSP environment for realtime audio applications. The results are analyzed to determine the efficacy and efficiency of the proposed solution.

Keywords: Adaptive microtuning, local consonance, realtime, SPSA, Pd, tonotopic dissonance.

(12)

(13)

Chapter 1 Introduction

There seems to be agreement that music is more interesting when passages of ‘tension’ and ‘relaxation’ alternate while it is performed. Interchangeable expressions for the same patterns include ‘pleasant’ and ‘unpleasant,’ ‘rest’ and ‘motion,’ ‘euphony’ and ‘cacophony,’ etc. Different explanations for the subjective perception of these patterns exist and depend on the context in which they are studied: physiology, music, psy-chology, genetics, etc. The psychoacoustical explanation is one of the most accepted. According to this theory, the separation in frequency of tones sounding concurrently determines directly the consonance; the more separated they are, the more ‘relaxed’ their interaction is perceived. In general, it’s desirable that in the alternation of tension and relaxation sections, the sounds corresponding to the relaxation state have a maxi-mum consonance, so the problem of achieving this can equivalently be considered the problem of maximizing the consonance of the interaction of simultaneous tones at a given time. These issues have been addressed for about six centuries, since musicians started to formalize the use of multiple sounds at the same time coherently.

Many solutions have been proposed along with development of western culture, reflecting the state of the art in technology and theories to explain the phenomena in each epoch. The ubiquity of fixed tuning instruments (like keyboards, woods, and brasses excepting the trombone) introduced an extra difficulty to the problem because of their inability to change pitch dynamically. Therefore, first approaches tried to solve this problem by adjusting the number of steps in the musical scale such that combinations of different steps produced more consonant chords in a given context. Scales with fewer notes were explored, as in pentatonic scales, and diatonic scales with seven tones, and scales with more notes, as in 36-notes scales, were used in the Renaissance. Fewer notes scales proved not to work very well with western preferences of harmony, and scales with more than twelve steps added unwanted extra difficulty to the playing of the instruments, restricting them to a few group of musicians.

More recent developments useDSPtechniques to compare an incoming audio

sig-nal against a scale template previously determined by the user, adjusting in real-time every detected tone to the nearest tone in the template. This technique is widely used in professional audio studios and its effect is well known especially when adjustments in the incoming signal are chosen to be dramatically discrete, producing the so-called “Cher’s effect” [3]. Contemporary with these approaches is adaptive tuning, or local consonance, a less popular technique in which, with the help of electronic instruments

(14)

and different mathematical models that explain consonance and dissonance phenom-ena, the tuning of concurrent notes is adjusted to achieve the maximum consonance possible at a given time. Potential advantages of adaptive tuning over other schemes include that it can also be applied to non-harmonic instruments and there’s no need to preload a pattern (scale) against which to match the audio signal. Several researchers have proposed various paradigms to explore local consonance, but most of them con-verge on using MIDI to express it, imposing over their solutions the limitations of the

protocol. Another common feature of most techniques is that timbre is assumed inde-pendent of the frequency, which simplification, justified by the real-time constraints, is poorly related to real instruments, for which the timbre can change depending on many factors, including the pitch, intensity of the note being played, interpreter, intrin-sic characteristic of a specific instrument, etc. These problems are addressed in this thesis and a new approach combining the mathematical models for consonance with

(15)

Chapter 2 Consonance

Consonance, understood as the absence of dissonance, is one of eight basic subjective phenomena associated with auditory perception. The other seven phenomena, accord-ing to Huron [4] are: loudness, pitch, timbre, toneness (how a given sound is rather perceived as a noise or as tone), apparent location, auditory streaming (how succes-sive sounds can be identified as coming from the same source), and numerosity (the impression of the number of concurrent sound sources). His definition of consonance as “the subjective experience of pleasantness, euphoniousness, smoothness, fusion, or relaxedness evoked by sounds” is the one adopted in this thesis.

2.1 Interpretations of Dissonance

A successful single theory to explain consonance & dissonance remains elusive, and some theoreticians and researchers argue that there’s no single theory that could ex-plain it but a set of them. Huron, extending the work of Plomp and Levelt [5], classifies those theories according to the following main tendencies:

frequency ratio the auditory ‘preference’ for small integer ratios, because of the

re-sulting periodicity of the stimuli,

harmonic relationship the expected dissonance when the harmonic relationships of

a composition doesn’t follow the classical canons of western harmony,

temporal dissonance related to the beating of a pair of sounds when the difference

of their frequencies is small enough to partially cancel the effect of each other (amplitude modulation),

tonal fusion the perceived euphony of simultaneous sounds that can be perceived as

a single tone,

tonotopic dissonance The perceived dissonance of a pair of sinusoidal waves when

their frequency difference is less than one critical bandwidth,

virtual pitch the component of dissonance that arises from competing (unclear)

(16)

expectation dissonance alterations on learned harmonic patterns, as in ‘cadences’

where a leading tone resolves to a different note than the tonic or its equivalent,

interval category or the difficulty to classify the formed interval of a pair of sounds

into a learned category of intervals,

absolute pitch category the perceived dissonance of a tone by a person with absolute

pitch when it’s impossible to classify it into one of the learned pitches due to the ambiguity of its frequency,

stream incoherence theory the component of dissonance that arises due to confusion

regarding streaming, and

relative dissonance the contextual relative consonance of a sonority when it is

pre-ceded by other sonorities of contrasting dissonance. This effect is related to the sensation of rest and peace experienced when the most dense and dark dissonant composition, listened as loud can be stood, finishes.

An extensive description of these and some other classes can be found in [4].

2.2 Gradus suavitatis

Euler stated that the consonance, or gradus suavitatis (‘degree of sweetness’), of a chord depends upon the consonance of the rational ratio of its frequencies [6]. An integera can be written as the product of its prime factors:

a = pk1

1 pk22 . . . pknn. (2.1)

Euler defines the gradus suavitatis function as :

G(a) = 1 + k1(p1− 1) + k2(p2− 1) + . . . + kn(pn− 1). (2.2)

The gradus suavitatis for an interval described by a ratio ofa/b with greatest

com-mon divisor equal to unity (gcd(a/b) = 1) is equal to the gradus suavitatis of its

factors:

G(a

b) = G(a × b).

For example, the gradus suavitatis of a M2, described by a ratio of9 : 8 is:

G(9 × 8) = G(72) = 1 + 3 × 1 + 2 × 2 = 8. (72 = 1 × 23× 32). (2.3) The Table 2.1 summarizes the gradus suavitatis for the the main intervals of a Just-Tuned scale.

Hans Straub [7] claims that although Euler’s definition of dissonance is in coinci-dence with other theories in which small values of dissonance are expected for intervals described by a small integer ratios, it fails to explain the consonance of equal temper-ament scales in which all interval ratios are irrational numbers, except for the octave

(17)

Interval Abbreviation Ratio gradus suavitatis

minor whole tone m2 16/15 11

major whole tone M2 9/8 8

minor third m3 6/5 8 major third M3 5/4 7 forth P4 4/3 5 tritone TT 45/32 14 fifth P5 3/2 4 minor sixth m6 8/5 8 major sixth M6 5/3 7 minor seventh m7 16/9 9 major seventh M7 15/8 10 octave 8ve 2/1 2

Table 2.1: gradus suavitatis for the main intervals of a Just-tuned scale

and its multiples. For those, Euler’s definition of consonance cannot be applied. Ap-proximating those irrational numbers by the closest rational numbers doesn’t help to confirm his hypothesis since, as Straub points out, the same mathematical property that implies that any irrational number can be approximated arbitrarily closely to any ‘sim-ple’ interval ratio can be used to demonstrate that it can be approximated by another arbitrarily ‘complex’ one, the human hearing system being unable to distinguish them. This problem is shared by any interval ratios theory.

2.3 Tonotopic dissonance

The tonotopic, or sensory dissonance theory, currently has more acceptance than the others. This theory, first proposed by Greenwood [8] and independently advanced by Plomp and Levelt [5], has its fundament in the work of Helmholtz [9]. Helmholtz ob-served that consonance can be explained in terms of roughness and beats of the partials of simultaneous complex waves. However, he assumed that the frequency difference that produces a maximum in the roughness perception was frequency independent. By analysis of data from several experiments in which subjects were asked to rate the consonance of a sinewave dyad, Plomp and Levelt concluded that the transition range between consonant and dissonant intervals is related to the critical bandwidth, the tightest range of frequencies which activate the same part of the basilar membrane [10]. Specifically, the maximum unpleasantness arises between two pure tones when they are separated by 25% of a critical band [5].1 _{Plomp and Levelt adopted the value} of 25% of the critical band reported by Zwicker, Flottorp, and Stevens [11] as the point of the maximum dissonance perception. The unpleasantness grows very fast from zero

1_{It’s interesting to note that in the evaluation of the consonance, the sounds were presented}

simul-taneously in both ears; further studies have shown that, as it can be expected, when the sounds are presented in different ears, the basilar membrane of each ear is stimulated by only one of these tones and therefore no tonotopic dissonance is perceived.

(18)

at the unison and disappears as the frequency difference between the sinewaves ex-ceeds one critical band. Figure 2.1 shows the averaged dissonance curve obtained by their collected subjective data.

0.25 0.5 0.75 1 1.25 2 critical bands 0.2 0.4 0.6 0.8 1 dissonance

Figure 2.1: Averaged dissonance curve of a dyad founded by Plomp and Levelt. The dissonance will grow to a maximum at 25% of the critical bandwidth.

Sethares [12] claims that the dissonance function d for two sinewaves with

fre-quenciesf1, f2 can be parametrized as:

d(f1, f2) = e−c1|f2−f1|− e−c2|f2−f1| , (2.4)

where c1, c2 are constants with values c1 = 3.5, c2 = 5.75, obtained by a least

squares fit from the experimental data provided by Plomp and Levelt. In addition, Sethares proposed Equation 2.5 as a scaled version of Equation 2.4 that includes sinewave amplitudes in the computation and allows the application of the model to dyads with different base frequencies:

d(f1, f2, a1, a2) = a1a2(e−c1s|f2−f1|− e−c2s|f2−f1|), (2.5) where s = d ∗ s1min(f1, f2) + s2 , (2.6)

a1, a2 are the amplitudes of the sinewaves, andd∗, s1, s2 are constants with values

d∗ _{= 0.24, s}

1 = 0.021, s2 = 19, also obtained by a least squares fit from Plomp and

Levelt data.d∗_{is the inflexion point of the function, or the interval where the maximum}

dissonance is perceived.s1, s2allow this function to ‘slide’ over itself from the unison

to a given frequency, so the ‘intrinsic dissonance’ for a given timbre can be obtained as explained later in this chapter.

(19)

2.3.1 Dissonance for complex tones

Plomp and Levelt supposed that for complex timbres, the overall dissonance for a given interval is the aggregate sum of the interactions between all of the constituent partials [5]; i.e., overtones from one timbre which interfere with overtones of another falling into one critical band contribute to the sum.

Supposing a complex tone whose spectrum is independent of the frequency,2 _it’s possible to calculate its ‘intrinsic dissonance’ by sliding from the unison to an octave, an identical version of its spectrum. Equation 2.7 gives the mathematical expression proposed by Sethares [12] for calculating it, DF being the intrinsic dissonance of a

tone with fundamental frequency_{f , n partials at p × f frequencies. This expression is} based on Equation 2.5. DF = 1 2 n X k=1 n X l=1 d(pkf, plf, ak, al). (2.7)

Sethares [13] also proposes an expression to calculate directly the dissonance of an intervalα for a given (unchangeable) timbre:

DF(α) = 1 2(DF + DαF) + n X i=1 n X j=1 d(pif, αpjf, ai, aj). (2.8)

For illustrating the mechanism, consider the steady state spectrum of anA4 tone

at 440 Hz played by an alto-trombone extracted from theSHARC timbre database [14] and presented in Table 2.2 and Figure 2.2. The resulting dissonance curves for the first six frequency components are presented independently in Figure 2.3. By sliding the trombone spectrum over itself, from the unison and finishing in the upper octave, the intrinsic dissonance curve for this sound is obtained, shown in Figure 2.4.

5 10 15 20 harmonic -60 -50 -40 -30 -20 -10 dB

Figure 2.2: Spectrogram for an alto-trombone playing a 440 Hz tone. Amplitude in dB vs. number of harmonic.

2_{In general, this is only possible to achieve using synthesized sounds, real instruments rarely have}

(20)

Overtone Amplitude (dB) Phase (radians) fundamental -4.46253 -1.65863 1 0.00000 -1.65068 2 -6.25037 -1.85469 3 -10.93560 -2.04795 4 -17.95490 -1.78295 5 -21.79480 -1.20478 6 -27.34640 -0.90173 7 -34.03120 -0.31566 8 -39.37550 0.48916 9 -49.92420 1.12302 10 -54.01170 -1.15258 11 -57.64190 -1.05098 12 -57.66760 -0.98125 13 -55.62120 -0.13202 14 -54.64860 0.94041

Table 2.2: Fundamental and partial frequencies (Hz), amplitudes in dB, and phases for an alto trombone playing a 440 Hz tone in its steady state.

500 1000 1500 2000 2500 Hz 0.2 0.4 0.6 0.8 1 dissonance

Figure 2.3: Dissonance curves for each of the 6 first components of the alto-trombone spectrum presented in Table 2.2. Notice that the bandwidths are different for each frequency component.

(21)

2.4 Limitations

The psychoacoustical approach of Plomp and Levelt satisfactorily explains some prop-erties of consonant intervals when using complex harmonic tones. Nevertheless the model fails to explain why in the case of simple sinewaves (i.e. tones with a single spectral component), an interval sometimes perceived as highly dissonant like M7 has a smaller ‘dissonance value’ than a P5, normally perceived as very consonant (notice that both intervals are greater than one critical band). This problem suggests that their model is not complete and that more research is needed to find other factors to adjust or extend it. 16 ₁₅ 9 ₈ 6 ₅ 5 4 4 3 45 32 3 ₂ 8 ₅ 5 3 16 ₉ 15 ₈ 2 ratio 0.2 0.4 0.6 0.8 dissonance value

Figure 2.4: Intrinsic dissonance curve for an alto-trombone playing a 440 Hz tone calculated as proposed by Sethares. 15 harmonics were considered, the vertical lines corresponding to ratios in the Just Tuning scale. Notice that for smaller ratios, the dissonance values are generally smaller.

(22)

Chapter 3 Scales and consonance

Scales appear when a greater interval (generally the octave) is divided into several parts. The progression from one of the constituent divisions to another by consecutive steps is an ascendant or descendant scale. Although this division can be done arbitrar-ily, there is a historical preference for seven and twelve notes scales in western culture. There is no universal agreement about the reason for this particular number of divisions but we accept it as a fact. Some researchers claim that it is because twelve is a highly divisible number, others explanations (more esoteric) argue that it is because that’s the number of equivalent spheres which can touch another equivalent sphere without any intersections (the “Kissing Number” for three dimensions [15]). One of the most accepted hypotheses is that twelve is the number of fifths necessary to closely approx-imate the range of seven octaves, the number of octaves present in a modern piano and for many practical purposes the extendt of human sensitivity:

129.746 . . . = 531441 4096 = ( 3 2) 12 ≈ 27 = 128.

Dividing an octave into an integer number of intervals presents several challenges, principally that it’s impossible to divide it using equal-size ratios of integers for all the steps in the scale.1 _{This uneven division means that some intervals can be only} represented using big integers, and it’s desirable to have small integer ratios between two notes for consonance purposes.

The set of divisions determines the ratios between the steps in a particular scale, and this is called tuning or temperament.2 _{Every octave division implies a tuning;} Pythagorean, Just Tuned (JT), several Meantones, and Twelve Tones Equal Tempered (12-TET) are among the most popular tunings in western culture.

A scale by itself doesn’t represent any problem when only one note is played at a time, but because music is a social experience, and most of time we are more in-terested in relationships between events than their independent occurrence, it’s more commonly the case that several notes are played simultaneously. Then, the temper-ament determines their consonance, and considerable effort is dedicated to choose a tuning that maximizes such consonance. This is even more important when music is

1_{For a demonstration, see [16, p. 52].}

(23)

played with fixed tuning instruments (woods, keyboards, guitars, etc.) since these can-not adjust pitch dynamically, imposing restrictions over their tuning and the rest of instruments playing concurrently.

The acoustics of an instrument determines its spectrum and the instruments used in western tradition typically produce harmonic sounds. Such sounds have frequency components (overtones) that may change in intensity, but they generally remain as integer multiples of a lower frequency called the fundamental. The spectrum is then divided by a fundamental and its harmonics; when the sound is non-harmonic, the over-tones are called partials. Pythagoras, one of the first researchers known to have been concerned about consonance, studied mainly harmonic sounds and found that intervals having small integers ratios between their fundamental frequencies are perceived as harmonious: the larger the numbers necessary to describe an interval as a ratio, the more dissonant the interval perceived. In order, the unison (1:1), octave (2:1), and fifth (3:2) are the more consonant intervals in harmonic instruments. The infinite series of sines and cosines that comprises a harmonic soundwave is known as a harmonic se-ries, and the mathematical treatment of them was given mostly by Fourier. Figure 3.1 presents the very approximated musical tones for the harmonic series of an ideal string tuned atC2 (110 Hz) presented in Table 3.1.

Overtone Frequency 0 110 1 220 2 330 3 440 4 550 5 660 6 770 7 880 8 990 9 1100

Table 3.1: The first 10 terms of the harmonic series of an ideal sound tuned at 110 Hz.

Ž

I

G

˘

8

˘

P5

˘

8

˘

M3 P5 m7 8 M2 M3 +4 P5 M6 m7 M7 8

˘ ˘ 2˘ ˘

˘ ˘ 4˘ ˘ ˘ 2˘ 6˘

˘

Figure 3.1: Harmonic series forC2and its roughly approximated correspondence with

musical notes. The legends between the staffs show pure intervals between the funda-mental and the successive note when reduced to the same octave.

Based on his discovery, Pythagoras proposed to divide the octave using this prin-ciple: Starting from the unison with the fundamental, jump successively by ascending

(24)

fifths (or descending fourths) and reducing the compound frequencies to the same oc-tave where the fundamental is. The scale constructed using this method has small integer ratios only for the octave, the fifth, and the fourth intervals. The problem with this scale is that for thirds and sixths, which are extensively used in western music, the ratios can only be expressed using big integers and sound simply ‘out of tune.’

There’s no perfect scale and some compromise must be found in order to favor one set of intervals or another. JTtries to produce correct thirds (both major and minor), to

the detriment of the mobility of music from one scale to another (modulation). Mean-tones aims to enable such mobility by sacrificing the consonance of the thirds, and 12-TETsimply divides the scale into twelve equal parts, allowing perfect mobility (en-harmonicity) but consequently sacrificing consonance. Table 3.2 compares the main tuning systems and their deviation from the (pure) intervals found in the harmonic series.

The 12-TET system, which was already gaining popularity among manufacturers since the XIXcentury, was widely deployed with the introduction of electronic

instru-ments, despite its poor consonance, because of the simplicity of its construction. Most of the music we hear now is performed in this tuning, and other tuning systems are not as well known as they should be.

A scale should be chosen depending the context. This context depends mainly on the acoustics of the instruments employed, and historical and cultural factors. Bach’s preludes and fugues in The Well-Tempered Clavier Book [20] were written to exploit the ‘character’ that every key have when using such meantone tuning. JTperforms the best matching with pure intervals, and for this reason it could be used to play harmonic music if its modulation restrictions can be passed.

Pythagoras used to say that we have lost the capacity of hearing “the music of the spheres.” Even though it has been always there, our ability simply atrophied. Some-thing similar can be said regarding tunings besides the pervasive equal temperament in western music: most of us, born during the past century, have listened to music per-formed using 12-TET tuning and not without difficulty could we differentiate it from

other tunings or perceive it as harmonically incorrect. The last one to notice the water is the fish.

3.1 The tuning and the context in non-equal tempered

scales

“ ...It seems to indicate that the observation plays a decisive role in the event and that the reality varies, depending upon whether we observe it or not” – Werner

Heisen-berg. [21]

Assuming twelveJT notes scale and considering thirteen possible intervals (includ-ing the octave), a given note in this system can have up to 78 possible tun(includ-ings depend-ing on the chosen reference note (global tundepend-ing) and the chosen interval. In general,

(25)

Tuning

Interval Pure Pythagorean 12-TET JT

Unison 1 : 1 1 : 1 1 : 1 1 : 1 0 0 0 0 m2 12 : 11 256 : 243 1059 : 1000 16 : 15 111.73 117.13 2121 = 100 ¢ 111.73 M2 9 : 8 9 : 8 561 : 500 9 : 8 203.91 203.91 2122 = 200 ¢ 203.91 m3 6 : 5 32 : 27 1189 : 1000 6 : 5 315.64 301.15 2123 = 300 ¢ 315.64 M3 5 : 4 81 : 64 63 : 50 5 : 4 386.31 407.82 2124 = 400 ¢ 386.31 P4 4 : 3 4 : 3 267 : 200 4 : 3 498.04 498.04 2125 = 500 ¢ 498.04 Tritone 7 : 5 1024 : 729 707 : 500 45 : 32 590.22 611.73 2126 = 600 ¢ 590.22 P5 3 : 2 3 : 2 749 : 500 3 : 2 701.95 701.96 2127 = 700 ¢ 701.96 m6 8 : 5 128 : 81 1587 : 1000 8 : 5 813.69 813.69 2128 = 800 ¢ 813.69 M6 5 : 3 27 : 16 841 : 500 5 : 3 884.36 905.87 2129 = 900 ¢ 884.36 m7 7 : 4 16 : 9 891 : 500 16 : 9 996.09 998.25 21012 = 1000 ¢ 996.09 M7 11 : 6 243 : 128 236 : 125 15 : 8 1088.27 1109.78 21112 = 1100 ¢ 1088.27 Octave 2 : 1 2 : 1 2 : 1 2 : 1 1200 1200 21212 = 1200 ¢ 1200

Table 3.2: Harmonic intervals in different tuning systems measured in ratios and cents (¢). The intervals in bold form the diatonic major scale. [18],[19]

(26)

for a set ofn intervals, the number of possible pitches can be determined by n−1 X i=1 i = n − 1 + 1 2 (n − 1) = n2_{− n} 2 .

Therefore, a context must be provided to select a specific pitch for a given note at a given time in tonal music. To illustrate the importance of the context, consider Fig-ure 3.2 with no information about the signatFig-ure or local modulation, and, to simplify, restricting pitch changes to only new notes:3

G

· · ·

¯

0

¯

702

˘˘

206 204

˘˘

2 0 0 −2 · · ·

Figure 3.2: A simple progression. The ellipses indicate that there are an unknown num-ber of bars before and after.

For the tuning of the first note (C4) there are0 cents of difference with itself. The

followingG4is a P5 raised fromC4 and therefore702 cents up, but the next D4 could

be a descendant P4 fromG4or an ascendant M2 fromC4, which gives a difference of

206 and 204 cents respectively; finally, following the same principle, there are three

possible difference values of the lastC4:−2, 0, 2.

This ambiguity resembles somehow the uncertainty principle of the quantum phe-nomena in which the first observation plays a key role in determining the velocity or the position of a subatomic particle. The particle exists as a ‘potentia’ and its realiza-tion is given by the observarealiza-tion. In the same way, a note could have potentially several pitches at a time but the musical context and the criteria of the interpreter reduce them to the most probable one. This is done by means of some rules, usually from the begin-ning of a musical composition a signature is established, modulations and alterations of the signature are provided before their occurrence.

Although signatures are needed to disambiguate pitches, they’re not sufficient to prevent global tuning drifting, induced when, in a performance, in the absence of fixed tuning instruments, music modulates. There’s no agreement to prevent or not prevent this drift: conductors and performers, among others, seem to prefer avoiding it based on the unpleasantness perceived when it occurs dramatically (i.e., in a short lapse); some researchers though, have shown that it naturally occurs when a set of non-fixed tuning instruments are played simultaneously (specifically, in a chorus) [22]. Whether or not to avoid it, a general tendency is to minimize it and to distribute it across suffi-ciently long periods.

3_{Realtime constraints enforce causality, which implies that no changes in the currently playing notes}

(27)

Chapter 4 Previous solutions

Perfect consonance in tonal music can be easily achieved using non-fixed tuning in-struments (strings, voice, trombone, and Theremin, are among the most popular) and, as shown previously, it happens naturally. For fixed tuning instruments (most plucked instruments, keyboards, woodwinds, etc.), the price of changing tunings in the middle of a performance is simply unaffordable. The oboe (the instrument with least tuning range in the symphonic orchestra) is used to determine the global tuning before a per-formance. In the same way, when fixed and non-fixed tuning instruments play together, there is no other possible temperament for the orchestra besides the one used for fixed-tuning instruments, in order to achieve maximum consonance. In other words, the tuning of the orchestra reduces to the tuning of the least tuning-adaptable instrument present in it.

4.1 Dividing the octave in more than twelve tones

Attempts to achieve “perfect” consonance have been explored since very early stages of western music. In general, these approaches were based on the idea of having a keyboard with numerous keys corresponding to different pitches, so a performer could choose at play time which tuning for the present note was better for the melody and har-mony being played. One of the best documented and impressive achievements of this kind is the work of Nicola Vicentino, who in 1555 published L’antica musica ridotta

alla moderna prattica (ancient music adapted to modern practice) [23], in which his

musical theories are explained and the archicembalo is introduced, a keyboard capable of playing up to 36 pitches per octave, allowing satisfactory reproduction of any inter-val in any scale. His innovation, as well as many of like kind, never became popular due to the required skills to play them.

Two mainstreams can be identified in dividing the octave into more than twelve tones: using equal tempered scales and using Just-Tuning or alternative tunings.

The ‘Equal tempered’ current is based on the approximations resulting from the continued fraction expansion of the octave division. As explained earlier in Chapter 3, there is no way to divide an octave into equal size integer ratios, so rational ap-proximations are obtained by truncating the continued fraction expansion of log2(32)

(28)

[24]: log₂(3 2) = 1 + 1 1 + ₂₊ 11 2+ 1 3+ 1 1+ 1 5+ 1 2+···

or, more compactly:

1 1+, 1 1+, 1 2+, 1 2+, 1 3+, 1 1+, 1 5+, 1 2+· · ·

which gives the convergent sequence:

log₂(3 2) ≈ 1, 1 2, 3 5, 7 12, 24 41, 31 53, 179 306, 389 665, · · ·

which denominators indicate the number of divisions per octave, and which numerators the number of divisions that better approximate a P5.1 _{Equal tempered scales of 41,} 53, and 306 tones give better resolution than the 12-TET for different intervals. Other

explored equal tempered scales like 19-tones, 24-tones, and 31-tones are based on octave ratios different from 2:1.

The other alternative, appearing in different and apparently unrelated cultures, in-cludes scales constructed applying the principles of Just-tuning and scales with some other construction pattern. Harry Partch’s 43-tones scales, Wendy Carlos’α, β, γ,

har-monic scales, and several ethnic tunings (Arabic, Pelog, Slendro,2_{Indian, etc.) are part} of this branch. They are explained in detail in [25].

4.2 The not-so-trivial solutions

Arguably, Sch¨oenberg’s introduction of dodecaphony in last century can be considered a new approach to solve the consonance problem: by imbuing each step of the

12-TET with equal importance, the predominance of a tone over the rest (tonality) is lost and therefore considerations about consonance are more flexible, making the 12-TET

intervals more than suitable for such musical expression.

A less-known attempt to achieve perfect consonance in tonal music using 12-TET

is the design of timbres based on the scale. As shown previously, timbre and scales are closely related, and traditionally scales follow timbre, but by this ‘reverse engineering’ process it is possible to synthesize sounds which partials are at 12√2 multiples of a

fundamental frequency. Sethares [16] presents an extensive explanation on this topic. Figure 4.1 shows the intrinsic dissonance curve for a virtual alto-trombone playing a 440 Hz tone with partials frequencies atf02

n

12 wheren is an integer number chosen

suchthat octaves and P5 predominate in the spectrum. Table 4.1 summarizes the results. Notice how for 12-TETintervals, low values of dissonance are obtained.

1_{Notice that a pentatonic equal-tempered scale appears earlier in the convergent series than the}

12-TET, in coincidence with the historical development of scales (although the historic one is not equal tempered). This fact resembles the biological myth coined by Haeckel: “Ontogeny recapitulates phy-logeny.”

(29)

53 ₅₀ 28 ₂₅ 119 ₁₀₀ 63 ₅₀ 133 ₁₀₀ 141 ₁₀₀ 3 ₂ 159 ₁₀₀ 42 ₂₅ 89 ₅₀ 189 ₁₀₀ 2 53 ₂₅ ratio 0.2 0.4 0.6 0.8 dissonance value

Figure 4.1: An intrinsic dissonance curve for a 440 Hz tone played by a synthetic instrument with the spectral envelope of a trombone but with partials at 12√

2 fractions

of the fundamental frequency.

overtone ratio Amplitude (dB) fundamental 1.0000 -4.46253 1 21212 0.00000 2 21912 -6.25037 3 22412 -10.93560 4 22812 -17.95490 5 23112 -21.79480 6 23412 -27.34640 7 23612 -34.03120 8 23812 -39.37550 9 23912 -49.92420 10 24012 -54.01170 11 24212 -57.64190 12 24312 -57.66760 13 24812 -55.62120 14 27212 -54.64860

Table 4.1: Virtual (impossible?) Alto-trombone with overtones suitable for playing in 12-TET. The amplitudes correspond to the spectrum of a real alto trombone playing a 440 Hz tone (duplicating column of Table 2.2), but frequencies are chosen to maximize consonance in 12-TETscales.

(30)

4.3 M

IDI

-based solutions

MIDIis a hardware and software specification intended to enable communicate among multiple musical instruments [26]. The conjunction of MIDI and electronic synthe-sizers offers a new way possibility to override the physical constraints that limit the tuning adaptability of a broad set of instruments. This has been exploited by many re-searchers, the most illustrative being deLaubenfels [27] and Sethares [28], respectively considered in the following subsections.

4.3.1 Springs network

.. . B C · · · · A Ground k2 k1 k0

Figure 4.2: The ellipses indicate the possible presence of moreMIDI notes in the

se-quence. The ‘ground’ node is the original tuning of the sese-quence.

John A. deLaubenfels [27] proposed a method for obtaining less dissonant MIDI

sequences using Just-tuning intonation. He expresses the relationships between the notes in a ‘spring network’ model, the accumulated energy of interconnected virtual springs being the ‘pain’ or the dissonance measurement of the sounding notes. Three kinds of springs are mainly used: vertical springs to represent the relationship between simultaneous notes, horizontal springs to represent the changes in pitch of a note while it’s sounding (or successively played), and grounding springs to measure the tuning drift of the whole piece respecting its key or the original frequencies at which the notes were played at the beginning. Figure 4.2 presents an example of a spring network as proposed by deLaubenfels. When a fixed-tuning mode is selected, only the vertical springs are considered.

In analogy with a physical system, elongation changes in any of the interconnected springs affect the net energy of the system, so by minimizing the overall energy of the system, the dissonance at any given moment in the sequence is minimized.

To obtain a dissonance-minimizedMIDIsequence using deLaubenfels’ method, the

sequence is loaded into a program that connect the necessary springs, and specify the elasticity of each spring depending on the loudness of the notes at a given time and general user-preferences. The minimum energy state of the resulting spring-network is then calculated and minimized using successive approximations in conjunction with

(31)

Monte Carlo pseudo-random motion through the tuning space. Although deLaubenfels recognizes that there are deterministic ways to find it (i.e., by calculating the inverse of the connections matrix), the chosen mechanism was found more suitable for long sequences and models using non-linear springs. The algorithm re-tunes every note in the sequence until zero total spring force is achieved for every spring. Examples of his results, and extended explanations, can be obtained from [27].

The force fn and potential energy En for an isolated spring n in deLaubenfels’

system are presented in Equations 4.1 and 4.2, wherek is the elasticity constant of the

spring, andx its elongation:

fn(x) = −knxn, (4.1)

En = −

Z

fn(x)dx. (4.2)

4.3.2 Psycho-acoustical curves

Sethares [28] uses the tonal consonance curves of Plomp and Levelt in ‘Adaptun,’ a program developed in Max to deploy more consonantMIDIsequences in real-time.

Sethares claims that sensory dissonance (or the cost function) is a generalization of Equation 2.7 concretely, the sensory dissonance D of m concurrent sounds with

different timbres (considering the firstn partials only) is the sum of dissonances of all

the intervals at a given time, and can be calculated by means of Equation 4.3:

D = 1 2 m X i=1 m X j=1 n X k=1 n X l=1 d(pjkfj, pilfi, ajk, ail). (4.3)

The algorithm proposed by Sethares in Adaptun is to calculate the sensory disso-nance of m concurrent sounds using Equation 4.3, and by small adjustments in their

fundamental frequenciesfm every k iteration, to decrease the dissonance value until

differences between successive frequencies fall below a thresholdδ for all the pitches: do for i=1 to m fi(k + 1) = fi(k) − µ ∂D ∂fi(k) . (4.4) endfor until_|f_i_{(k + 1) − f}_i_{(k)| ≤ δ ∀i}

whereµ is an arbitrarily small constant. The minus sign in the gradient guarantees

that the approximation converges to a minimum point of inflexion, not a maximum. Nevertheless, in Adaptun several simplifications are made in consideration of practical time constraints:

1. The instrument spectrum for each MIDI channel is known a priori by the soft-ware.

(32)

2. Equation 2.4 is implemented as a look-up table, anda1anda2are set to unity.

3. A nominal frequency of 500 Hz is used for all calculations between all partials. 4. The calculation of the gradient is avoided in the algorithm, and instead is

ap-proximated by fi(k + 1) = fi(k) − µg(fi(k)) where g(fi(k)) = D(fi(k) + c∆(k)) − D(fi(k) − c∆(k)) 2c∆(k)

and _{∆(k) is a randomly chosen ±1 vector with a Bernoulli distribution (the} probabilityp = 0.5).

The substitution g(fi(k)) is a variation of the Simultaneous Perturbation Stochastic

Approximation (SPSA) method [29] [30] explained in detail later. µ and c are selected

so the updates offiaverage about 1 ¢. The difference with the originalSPSAis that the

coefficients (µ and c) don’t vanish, allowing therefore recognition of newly arriving

notes. The user can override these default parameters through a graphical interface [28].

Finally, tonal drifting is prevented in Adaptun by means of context, persistence, and memory models. These mechanisms include absent partials in the dissonance calculation, which presence is important because they are somehow remembered by the listener. Several musical examples processed with Adaptun can be downloaded from [31].

4.3.3 M

IDI

issues

Although the MIDI protocol [26] provides mechanisms for tuning individual notes in real-time, there’s no consensus among electronic instrument manufacturers about their implementation. In general, the applications mentioned earlier differentiate in their approach and methodology to calculate (or approximate) the best tuning for a note at a given time, but converge in the way they send each calculated note to a synthesizer: the calculated note is expressed as twoMIDImessages, the first one adjusting the pitch

bend of the channel to achieve the desired frequency from the closest 12-TET pitch

of the scale, and the second firing a NOTE-ON message of that pitch, after which the synthesizer proceeds to produce the sound.

The PITCH BEND ADJUSTMENTmessage is a 14-bit message (for a total range of [0,16384)), no pitch bend change being indicated by a message with value 8192. Its final value is determined in the receiver in consideration of the PITCH SENSITIVITY

parameter. Having previously adjusted it to one semitone (via anRPN message), and considering that the maximum bending possible is the 8192, the maximum resolution possible is about 0.01 ¢(a centicent!).3 This resolution is more than sufficient for any acoustical application, since one ¢ is less than half the just noticeable difference for human listeners [18]; however, the quality of this mechanism is uneven across different manufacturers.

3 100

(33)

The MIDI protocol specifies three mechanisms to alter the tuning of a single note using Universal System Exclusive Messages: Bulk Tuning Dump (non-realtime), Bulk Tuning Dump Request (non-realtime), and Single-note Tuning Change (realtime). Al-though the last mechanism is the most suitable for real-time implementations, it’s rare to find it in contemporary synthesizers. To overcome this limitation, a combination of pitch, pitch bend change, and pitch bend sensitivity messages is used, as mentioned before. The mechanism of combining different messages presents the following limi-tations: the pitch bend messages are specific for eachMIDI channel, and all the notes

played in a given channel are affected by its current value, making it impossible to play a chord correctly in a single channel. This problem can be avoided by playing each note of a chord on a different channel, consequently restricting the polyphony to 16 parts (in absence ofMIDI channel extension). Another known limitation is that it’s possible to hear the effect of the next pitch bend message over the end of the previous note if the release time is long enough (the release time follows aNOTE-OFFmessage).

4.4 Comparison of deLaubenfels’ and Sethares models

The models proposed by Sethares and deLaubenfels are both relevant to the present research. Here is a comparison of them:

• Both are well known among the micro-tuning community and several examples

of their results can be obtained from the internet.

• Both lack mechanisms for detecting changes in timbre. • Both include mechanisms to avoid (or minimize) tonal drift.

• Despite both models having a deterministic way to calculate adjusted pitches,

they use heuristics instead due to the complexity of the evaluations. The results are approximations of the model that both authors introduce.

• The mathematical treatment of deLaubenfels’ model is simpler than Sethares’

model.

• deLaubenfels’ model is intended only for harmonic spectrums playing in

Just-Tuning. Sethares’ model includes them but algorithmic extension to other tun-ings or non-harmonic timbres is straightforward.

• Both rely mainly on the MIDI protocol and synthesizer capabilities, inheriting their limitations.

• It’s not possible to use deLaubenfels’ model in realtime as it is, since it requires

knowing the whole sequence to calculate the minimized dissonance version. On the other hand, in order to use Sethares’ model, several simplifications must be assumed, especially in the calculation of the gradient in 4.4.

(34)

4.5 Observations regarding Sethares’ model

The present research is mainly based on Sethares’ work. However, they differ in many points, including the critical band in the computation of the dissonance: Sethares cal-culates in advance the dissonance curves for each timbre using a nominal frequency of 500 Hz, saving these results in memory. These values are retrieved by his algorithm every time they are needed, avoiding in that way direct calculation of the exponential functions (expensive inCPUtime) and therefore easing the real-time implementation. This approximation implies a critical bandwidth of uniform size for all considered fre-quencies. The critical bandwidth is not constant over the human spectrum of hearing, and the induced error is greater, in particular, when the frequencies are very high or very low. However, this error seems to be a reasonable trade-off in order to get the implementation of adapting tuning in real-time.

There are several models for the mathematical representation of the critical bands (Bark scale, critical bands, equivalent rectangular bandwith, etc.). An expression to calculate an equivalent rectangular bandwidth (ERB) for frequencies 100 ≤ f ≤

10, 000 Hz at moderate levels is given [32] by:

ERB(f ) = 0.108f + 24.7. (4.5)

whereERBis in Hz andf is the center frequency of theERB. This model currently has more acceptance than previous models. Figure 4.3 shows the equivalent critical band for the specified range, calculated using Equation 4.5.

100 200 500 1000 2000 5000 10000 frequency 100 150 200 300 500 700 1000 bandwidth

Figure 4.3: Equivalent Rectangular Bandwidth for frequencies_{100 ≤ f ≤ 10000 Hz} Plomp and Levelt emphasize that perceived consonance depends on the frequency. That’s the main difference between their work and Helmholtz’s observations; specifi-cally they emphasize the consonance dependency on the critical bands, and although the compromise of using a nominal frequency in calculations for achieving a better performance in realtime gives a good approximation, a more precise model should address these issues to minimize computation errors.

(35)

4.6 Commercial applications

Several commercial applications currently available can achieve perfect consonance in real-time using different approaches than those presented in this thesis. Among the most visible in the market are Akai PitchRight and DeccaBuddy,4 _Antares Auto-Tune[33], and RBC VoiceTweaker Pro.5 _{These applications, though, are focused on} correct pitch over a tuning template chosen from a manufacturer-supplied predefined set or from a user customized set.

Figure 4.4: Auto-tune, Antares Audio Technologies.

Introduced by Antares Audio Technologies in 1997, Auto-Tune Pitch Correcting Plug-In (currently in version 4) is one of the most widely used plug-ins in professional and home audio studios. This application corrects intonation errors or allows one to modify the intonation of a performance usingDSP algorithms that transpose a contin-uously detected pitch of a periodic input signal to a desired pitch defined by any of a number of user programmable scales (including minor, major, chromatic, and 26 his-torical and microtonal scales) or through the use of graphical editing tools. Auto-tune allows the user to control the retune speed, add vibrato to voices, select the sampling rate up to 192 kHz (depending on the subjacent hardware), select optimized templates for the most commonly pitch-corrected inputs (i.e. soprano, tenor, bass, etc.), and so on.

The difference between these applications and the purpose of this thesis is that in my research, no knowledge about specific tuning is necessary to maximize consonance: the tuning can change dynamically as the incoming audio signals interact with each other. This can be achieved by some of the software examples presented in this section by a meticulous batch process. The application of such audio effects to non-harmonic instruments requires specification of a suitable scale to initialize the plug-in so that

4_{www.akaipro.com/productsVSTPlugins.html}

(36)

an incoming audio signal can be matched against it to determine the closest step in the designed tuning and in that way maximize the consonance. This process is time-consuming because it requires analysis of the spectrum of the instruments involved.

(37)

Chapter 5 Proposed model

5.1 The proposed model

Choice of a function to express tonal dissonance seems to be somewhat arbitrary since the data provided by Plomp and Levelt is not precise [24] and the main purpose of the chosen function is to mimic the behavior reported by them. Even the value of 25% of bandwidth in which the perceived dissonance reaches a maximum, is described by Plomp and Levelt as “a rule of thumb” [5]. Sethares obtained his model by a least-squares fit from the experimental data provided by them. Benson [24] claims that the expression proposed by Sethares (Equation 2.4) for calculating the dissonance curve can be replaced by Equation 5.1, wherex is the frequency difference in terms of critical

bandwidth:

d(x) = 4|x|e1−4|x|. (5.1)

This simpler function satisfies the behavior of the Plomp and Levelt curve: max-imum dissonance at 25% of the critical band, zero dissonance at the unison, and a minimum value for frequency differences of one critical bandwidth. Parncutt [34] pro-posed to represent the dissonance asd(x) = (4xe(1−4x)₎2_{, but again, since the purpose}

of the mathematical model is to approximate the hearing model devised by Plomp and Levelt, the function proposed by Benson was selected since it’s simplest one. Sethares’ model is compared against Benson’s model in Figure 5.1.

The selected paradigm for the treatment of sensory consonance is not unique, as ex-plained previously. Mashinter [35] discusses other models and compares their charac-teristics. The most accepted of these theories (and the one followed in this discussion) is that proposed by Plomp and Levelt, who based their work on Zwicker’s group find-ings. Nevertheless, several doubts have arisen regarding their results since later studies found that the size of the critical band estimated by Zwicker is too large, especially in the low frequency region. Other researchers (notably Greenwood [36] [37], Moore and Glasberg [32]) have proposed other values for critical bands and interpretations of the same phenomena (roughness and beating) introducing the term ERB (Equivalent

Rectangular Band) to avoid the confusion with the former ‘critical bands.’ The rela-tionship between ‘critical bands’ andERBs is so that a 25% of a critical band (as it is understood in Plomp and Levelt work) is equivalent to about a 40% of an ERB. The

(38)

0.5 1 1.5 2 bw 0.2 0.4 0.6 0.8 1 dissonance

Figure 5.1: The dashed line is the curve obtained by Benson’s model; the solid one by Sethares’ proposal.

calculation of theERBwas introduced in Equation 4.5, and a revised expression for the dissonance considering theERBcorrection is introduced in Equation 5.2:

d(x) = 2.5|x|e1−2.5|x|_, _(5.2)

x being the frequency difference in terms ofERB.

Equation 5.2 can be used to construct an expression for dissonance of a dyad equiv-alent to Equation 2.4, extended to include theERBin the calculation:

d(a1, a2, f1, f2) = 2.5 a1a2

∆f bwe

1−2.5∆f_bw_, _(5.3)

where_{∆f = |f}2− f1|, and bw =ERB(max(f2, f1)).

In another interpretation of Plomp and Levelt work, one should calculate the mean frequency of the dyad in question, find itsERB, and use it in Equation 5.3 instead of the ERB of the maximum of the two frequencies. Without loss of generality, similar effects can be achieved by using any of these approaches, the last one requiring fewer steps to compute.

The intrinsic dissonance curves for the alto trombone playing a 440 Hz tone ob-tained by expressions proposed by Sethares and Equation 5.3 are compared in Fig-ure 5.2. Notice that the dissonances of m3 and M3 intervals have a more similar value in the new model.

According to the findings of Plomp and Levelt, the dissonance reaches a minimum for frequency differences of about one critical band and beyond that it rapidly vanishes. An alternative representation of the dissonance that includes this criterion is presented in Equation 5.4. The inclusion of a conditional in the model complicates the direct calculation of the gradient but it can save some cycles when implemented by numerical approximations as it is in this case.

(39)

16 ₁₅ 9 ₈ 6 ₅ 5 ₄ 4 ₃ 45 ₃₂ 3 ₂ 8 ₅ 5 ₃ 16 ₉ 15 ₈ 2 ratio 0.2 0.4 0.6 0.8 1 dissonance value

Figure 5.2: Intrinsic dissonance curve of an alto trombone playing a 440 Hz tone cal-culated by Sethares expression (the solid line) and Equation 5.3 (dashed-line)

d(a1, a2, f1, f2) =    0 ∀f1, f2 | |f1− f2| ≥ 1.21ERB(max(f1, f2)); 2.5 a1a2∆f_bwe1−2.5 ∆f bw; (5.4)

The1.21 coefficient in Equation 5.4 is an arbitrary constant reflecting the

assump-tion of Plomp and Levelt that the dissonance is non zero at a frequency difference of oneERB. Tmn=      f11 f12 f13 · · · f1n f21 f22 f23 · · · f1n .. . ... ... . .. ... fm1 fm2 fm3 · · · fmn      (5.5) Amn =      a11 a12 a13 · · · a1n a21 a22 a23 · · · a1n .. . ... ... . .. ... am1 am2 am3 · · · amn      (5.6)

Complex tones sounding concurrently can be conveniently represented by the tim-bre matrix 5.5 each row representing the frequency components of one instrument, and the amplitude matrix 5.6 each row representing the normalized amplitude (in the

(40)

range of [0, 1]) of every frequency component. Both matrices havem timbres and n

overtones.

Based on matrices 5.5 and 5.6, an unified expression for calculating complex tones dissonance (Equation 2.7), intervals dissonance (Equation 2.8), and the dissonance of several timbres playing different notes (Equation 4.3), is proposed in Equation 5.7:

D = m−1 X h=1 m X i=h+1 n X j=1 n X k=1 d(ahj, aik, fhj, fik). (5.7) f11 f21 f31 f12 f22 f32 f13 f23 f33

Figure 5.3: Illustration of dissonance calculation.

Equation 5.7 calculates the sum of the dissonances for each pair of frequency com-ponents belonging to different instruments, as illustrated in Figure 5.3, this means that it’s assumed that two consecutive harmonics of a given timbre don’t fall within the sameERBand therefore don’t contribute to the dissonance sum. This is true in general for the first (and most audible) harmonics and can be corroborated by calculating the

ERB (Equation 4.5) for the higher frequency of a consecutive pair. The resultingERB

is centered in this frequency and for frequencies below 10,000 Hz the nearest partial is found at a frequency separated by more than half the ERB, as can be seen in Table

5.1. For higher partials this condition is not true, but their contribution to the total dissonance is not considerable because, in general, their amplitudes are very small. Equation 5.7 is implemented in the present thesis.

Finally, a comparation of the intrinsic dissonance curve of a trombone playing a 440 Hz obtained by Sethares’ model and Equation 5.7 is presented in Figure 5.4. The discountinuities of the curve are a consequence of the conditional in the equation.

(41)

Overtone Frequency ERBlower cutoff Frequency 0 1000 933.65 1 2000 1879.65 2 3000 2825.65 3 4000 3771.65 4 5000 4717.65 5 6000 5663.65 6 7000 6609.65 7 8000 7555.65 8 9000 8501.65 9 10000 9447.65

Table 5.1: The first 10 terms of the harmonic series of an ideal sound tuned at 1000 Hz and the corresponding lower cutoff frequency of anERBcentered in each term.

16 ₁₅ 9 ₈ 6 ₅ 5 ₄ 4 ₃ 45 ₃₂ 3 ₂ 8 ₅ 5 ₃ 16 ₉ 15 ₈ 2 ratio 0.2 0.4 0.6 0.8 dissonance value

(42)

5.2 Implementation considerations

Mainly to optimize the time performance and to avoid monotonicity in the output, the following constraints and computation strategies were applied to the algorithm:

5.2.1 Vicinity

According to the presented dissonance models, for harmonic sounds, the interval with the greatest consonance is the unison, and in theory a dissonance-minimizer algorithm will output this interval if no other restrictions are imposed. Then, in a hypothetical case like this, no matter which inputs were presented to the algorithm, the result should be always somewhat monotonic (i.e. always the same tone). Extrapolating this idea, in general, a real algorithm should consider only the vicinity of the given inputs to calculate the dissonance-minimized versions of them so the estimated intervals regard the same character as that presented at the input. In other words, if a duet is playing a m3 the output of the algorithm should be recognizable as a m3 and not a M3, which in general has a lesser dissonance value but a completely different character.

In average, semitones are separated by 100 ¢ as in 12-TET scales, the greatest

dis-crepancy between 12-TETandJTintervals being of about 16 ¢ occurring in the m3 and

M6 intervals (See Table 3.2); the frequencyJND(Just Noticeable Difference) is around one twelfth of a semitone or approximately 8.3 ¢ [38].1 _{Based on these values, an} ini-tial default vicinity of _{±8 ¢ is adopted for the application, but provided mechanisms} allows the user to adjust it. This value is large enough to potentially ‘correct’ a 12-TET

m3 without converting it into a different interval.

5.2.2 Computation optimizations

The tonotopic theory for dissonance asserts that interacting frequencies whose dif-ference is greater than one critical band don’t contribute to the overall dissonance of complex sounds. Based on this assumption, it’s possible to reduce the number of com-putations when calculating the dissonance for complex tones:

Consider two frequenciesf1, f2. Iff1 < f2, thenf1 < nf2for alln ≥ 1.

This consideration can be introduced in the evaluation of Equation 5.7: If the disso-nance equals zero when comparing one partial of a timbre (call itf1) against one partial

of another (f2), andf1 < f2, then the dissonance of each of the remaining partials of

the second timbre (in generalnf2withn ≥ 1) and the current partial of the first timbre

will be consequently zero. Nevertheless, the introduction of this criterion in the disso-nance computation implies that it cannot be applied to harmonically compressed tones [39], or tones with octave ratios lesser than 2:1 implying a more compact spectrum.

Other optimizations like precalculating known data and retrieving it from an array at run time are also sources of improvement in the real-time performance as will be shown later.

1_{Caveat: This value is a very rough approximation since the problem of pitch discrimination is very}

Local Consonance Maximization in Realtime

A thesis submitted in partial satisfaction of the

requirements for the degree of

Master of Computer Science and Engineering

in the Graduate School of

the University of Aizu

Local Consonance Maximization in Realtime

by

Juli´an Villegas

Local Consonance Maximization in Realtime

Juli´an Villegas

Contents

List of Figures

List of Tables

Chapter 1

Introduction

Chapter 2

Consonance

2.1

Interpretations of Dissonance

2.2

Gradus suavitatis

2.3

Tonotopic dissonance

2.3.1

Dissonance for complex tones

2.4

Limitations

Chapter 3

Scales and consonance

Ž

I

G

˘

˘

˘

˘

˘ ˘ 2˘ ˘

˘ ˘ 4˘ ˘ ˘ 2˘ 6˘

˘

3.1

The tuning and the context in non-equal tempered

scales

G

¯

¯

˘˘

˘˘

Chapter 4

Previous solutions

4.1

Dividing the octave in more than twelve tones

4.2

The not-so-trivial solutions

4.3

M

IDI

-based solutions

4.3.1

Springs network

4.3.2

Psycho-acoustical curves

4.3.3

M

issues

4.4

Comparison of deLaubenfels’ and Sethares models

4.5

Observations regarding Sethares’ model

4.6

Commercial applications

Chapter 5

Proposed model

5.1

The proposed model

5.2

Implementation considerations

5.2.1

Vicinity

5.2.2