Pd-based implementation
6.1 Pure-data (Pd)
Pure-data (Pd) [40] is a real-time graphical programming language for audio and graphical processing, developed by Miller Puckette and continuously extended and supported by an active community. It resembles the Max/MSP system but is intended to be much simpler and more portable. Pd is open-source and available from [41].
It runs on SGI, Microsoft Windows, Linux, and Mac OS X platforms. Because of its features –multirate, realtime, multichannel, availability, active community support, etc.– Pd is an adequate candidate to replace theMIDI protocol in dynamic tuning sys-tems, allowing implementation of features excluded by simplifications made in prior approaches.
Pd can be considered an implementation environment (a visual dataflow language) instead of a specification language. Pd ‘patches’ (collections of interconnected ob-jects) correspond roughly to abstract block diagrams. Pd allows one to build audio patches which can analyze incoming sounds, process them to produce transformed audio outputs, synthesize musical sounds, and integrate audio processing with other media. Audio signals are internally stored as 32-bit floating point numbers [42], but depending on the hardware, audio I/O is usually limited to 16 or 24 bits. Inputs all appear between values of ±1, and output values are clipped to that range, too. The default sampling rate is 44.1 kHz, but it can be easily changed. Pd can read and write samples to files either in 16-bit or 24-bit fixed point or 32-bit floating point, in WAV,
AIFF, orAUformats.
There are at least four kind of objects in Pd: native, ‘one-off subpatches,’ abstrac-tions, and externals. Native objects are general-purpose objects distributed with Pd;
many patches can be constructed using them exclusively. The adc˜ object in Fig-ure 6.1, which provides access to the incoming streaming data from the sound device, is an example of this kind. (The ‘˜’ at the end of an object, as well as the bolder connecting lines, indicate that the inputs are audio signals.) Abstractions and ‘one-off subpatches’ are special Pd patches that can be loaded inside of another patch, which mechanism is useful to encapsulate a group of tasks that have a common objective.
Theoutput˜object in Figure 6.1 is an example of such an abstraction. This abstrac-tion provides access to the output channels of the audio device (via dac˜ objects) and allows the user to control the loudness from the GUI. Finally, externals objects
are a provided mechanism for extending Pd capabilities: they are programmed in C and deployed as dynamically linked libraries. Once the externals are loaded in Pd, there’s no distinction between native and external objects. Thegoldenear object, encapsulated in theminimization subpatch, is an external object created to mini-mize the dissonance (this object is explained in detail later). Examples, suggestions, and deeper explanations of externals programming as well as the documentation of the most important functions provided by Pd’sAPI(m pd.h) are available at [43].
Objects in Pd communicate with each other by means of messages. A message contains a selector and a certain number of arguments. Several selectors are provided in Pd –for example, floats, symbols, lists, bangs, etc. Messages connect an object’s outlet to one or several objects through their respective inlets. Messages observe special rules in the last case: The first message is not done until the cascade of further messages are done first, this feature is called ‘depth first message passing’ in Pd. Also, there are other considerations regarding the number of inlets and outlets of an object and the order in which they are evaluated and fired respectively.
[42] offers a deeper and more extense explanation of Pd features and its program-ming paradigm.
6.2 Implementation
Figure 6.1: ImplementedGUIin Pd for achieving real-time dissonance minimization.
The patch (introduced in Figure 6.1) uses oneadc˜object to get four audio signals from the sound card and extract from them 10 frequency components by means of fiddle˜objects encapsulated in theinputs reader˜subpatch. Frequencies and amplitudes are separated and labeled in the repacking object and passed to the minimizationobject, which outputs a target frequency for each signal, calculated by a goldenearobject. The pitch correctionobject transposes the original signals to the calculated target frequencies using single-sideband modulation (SSB).
The processed versions are sent to the sound card to their reproduction. An schematic diagram of the implementation is presented in Figure 6.2.
I1 O1
Figure 6.2: The gray boxes indicate the operations processed by the sound card.goldenearwas newly created for the purposes of this research.
6.2.1 fiddle˜ object
fiddle˜ is a monophonic or polyphonic maximum-likelihood pitch detector [44], similar to that proposed by Rabiner [45]. This object has four outlets, and performs among other tasks, the analysis of an incoming signal (by DFFT and a Hanning win-dow) and extracts as many as four pitches and their respective amplitudes. The pitches are sent to the third outlet expressed as MIDI number notes. For example, a 220 Hz tone will be reported as #57. A raw list of spectral components (up to 100) used in the pitch determination can be obtained from the fourth outlet, the list consisting of frequency-amplitude pairs (phases are not considered). This is the information needed to calculate the dissonance in the patch.
fiddle˜ receives as parameters the size of the analysis window, the number of spectral peaks to consider in determining the pitches, the minimum amplitude required from a signal to trigger the analysis, and, in the presence of vibrato, the maximum deviation in semitones from the pitch and its duration in ms to consider detected fol-lowing frequencies as the same pitch. The analysis hop size is half the analysis window size, and the minimum detectable frequency is2.5 cycles per analysis window. For ex-ample, in a patch usingfiddle˜with a window of 1024 samples and at a sampling frequency of 44.1 kHz, analyses are done every 512 samples (11.6 ms), and the anal-ysis uses the most recent 1024 samples (23.2 ms). Other fiddle˜ parameters and features less relevant to this research are not detailed here, but they are explained in [44].
The creation arguments for fiddle˜ in the implementation are: 1024 samples window (which allows the detection of frequencies greater than 117 Hz at a sampling rate of 48 kHz),1zeroMIDIpitches to calculate (thereby saving computation time since the MIDI values are not useful for the purpose of this application), and 10 frequency components to be sent through the fourth outlet. Monophonic signals only are assumed to be presented as inputs.
The analysis window size as well as the vibrato parameters can be changed in realtime through the GUI (inputs readerobject). In the same way, the user can interruptfiddle˜’s analysis using the toggle button labeled ‘auto.’
12.5× (480001024)−1
6.2.2 Single-sideband modulation
In general, changes in pitch or pitch correction as in the case of this implementation, are achieved by using vocoders which preserve better the formants in case of the human voice, nevertheless these mechanisms are time and processor consuming, so Single-sideband modulation, a simpler but effective alternative is proposed instead:
A digital (real) signalx(n) can be converted in a complex function X(n) = a(n) + ıb(n)
which real part a(n) is equivalent to the original one x(n) = ℜ(X(n)) = a(n) by filtering x(n) with two special all-pass filters having transfer functions H1 and H2
observing the condition that:
a(n) is the output of the first filter and b(n) the output of the second [40]. The filters are a digital implementation of the Hilbert transform in which the negative frequencies of the signals are advanced in phase 90 degrees and the positive frequencies are delayed 90 degrees.X(n) is analytical and all its frequency components are positive and lesser than the sampling frequency [46]. The design of these pairs of filters is explained in detail in [47].
X(n) can be used to calculate xm(n), an amplitude-modulated version of x(n), by multiplying it by a complex sinusoid Y (n). This procedure is known as single sideband modulation (SSB). SSBis a refinement ofAMwhere one of the two sidebands is eliminated. The resulting signal can be expressed as:
xm(n) = ℜ(X(n) Y (n) = ℜ(X(n) e(±ı2πf0nfs )), (6.2) wheref0is the frequency ofY (n) (carrier) and fsthe sampling frequency.
Pd provides thehilbert˜abstraction which calculatesX(n) in the way that was mentioned above. There is no way to compute exactly the Hilbert transform, but the approximation given by this abstraction is sufficiently good for practical purposes. The two outputs ofhilbert˜ are multiplied by a complex sinusoidY (n), and real part of the resulting signal has component at the frequency components ofx(n) plus the frequency of theY (n). Figure 6.3 illustrates the process of calculating the SSB.
6.3 goldenear
This external object calculates the target frequencies in the vicinity of the input signals that minimize their dissonance. goldenearhas three inlets: the leftmost one prints debugging information in Pd’s console when receives a bangmessage. Frequencies and amplitudes are received in the second and third inlet, respectively, as shown Fig-ure 6.4.
cos2πff0n
s
×
x(n) + + xm(n)
Hilbert
Transform ×
sin2πff0n
s
Figure 6.3: SSBdiagram. Rigorously, there must be a delay block in the line between x(n) and the upper multiplication block (⊗) of the same duration as the time needed by the Hilbert transformation block to compute it.
Figure 6.4: goldenearas connected in the implementation
The user can specify as the first and second creation arguments the desired vicin-ity in ¢ and the maximum number of iterations that the algorithm must evaluate the dissonance function.
When a new tone is detected, its amplitudes and frequencies are sent in that or-der togoldenear, so once the frequencies are written the minimization process can start. There is a label preceding each list (spectral and amplitude components) which indicates the row of the frequency and amplitude matrices being updated. The target frequencies are calculated by anSPSAprocedure, as explained in the following subsec-tion. The expression for calculating dissonance is the one presented in Equation 5.7.
The outputs are sequentially fired starting from the leftmost (corresponding to the first timbre). Figure 6.5 shows a flowchart of thegoldenearalgorithm.
variables initialisation
New tone detected?
Update Freq. & Amp. matrices Minimize usingSPSA
Output f0s End
Yes
No
No
Figure 6.5: goldenear’s flowchart.
6.3.1 Simultaneous Perturbation Stochastic Approximation
SPSA goldenearusesSPSAto approximate the minimum dissonance value. This heuristic recursive procedure, proposed by Spall, is similar to simulated annealing in the way that the perturbation size is damped with every iteration (cooling) but they differ in thatSPSAlooks for local minima instead of global minima as in simulated annealing [30].
This means that if sufficient resources (mainly CPUtime and memory) are given to a simulated annealing algorithm, it will output the global minimum of the loss function.
This is not guaranteed in SPSA. Since the problem of this thesis is to find local con-sonance for a given set of sound sources, this feature doesn’t represent an impediment for using it in this process.
Some of the main advantages of this algorithm over other heuristic approaches, are that it only uses two evaluations of the loss function per iteration, it’s tolerant to noisy signals, and no information about the gradient is needed.
TheSPSAalgorithm was introduced briefly when Sethares’ Adaptun was described.
Here, the canonical form of such algorithms is presented:
θk+1~ = ~θk− akgk( ~θk), (6.3) wheregk( ~θk) is
gk( ~θk) = y( ~θk+ ck∆~k) − y( ~θk+ ck∆~k) 2ck∆~k
. (6.4)
θ~k is a vector of the solution space andy( ~θk± perturbation) are measurements of the loss function (in this case the dissonance) and k the iteration number. The pertur-bation is given by a Bernoulli vector ~∆k and the damping factorck. The algorithm approximates the gradient by measuring the loss function plus or minus a perturbation which gets smaller with every iteration [29]. There are several ways to select the initial guess inSPSA, in the case ofgoldenearthe given timbres as they were received, are chosen as the initial guess for simplicity.
Spall proposes a MATLAB [48] implementation of the algorithm [29] in which the damping factors are calculated as
ak = a
(k + A)α, ck = c
kγ (6.5)
whereA, a, and c are constants that determine the size of the perturbation at initial con-ditions, andα = 0.602 and γ = 0.101 are suggested values for the damping constants.
In the implementationkγwas implemented as a look-up table.
In the case of the dissonance problem, only fundamental frequencies are perturbed, since perturbing any other spectral component alters the timbre. In other words, the spectral component must conserve the original ratios to preserve the timbral character-istics of the evaluated sounds along all the iterations. Besides that, since the frequency range in which the algorithm operates is considerably broad, the perturbations are also scaled. (Adding 20 Hz to a 200 Hz tone represents a change of 10% which can be sig-nificant, but the same amount represents only a 1% in a 2000 Hz tone.) This is done by converting the scalarsak, ckin the respective vectors:
~
ak = afki0
(k + A)α, ~ck = cfki0
(k)γ, (6.6)
wherefki0corresponds to the fundamental frequencies in matrix 5.5 at thekthiteration.
These expressions are included in Equation 6.4.
According to Spall [49] the convergence of SPSAis guaranteed if these main con-ditions exist:
• the entries in perturbation vector (∆ki) are independent and symmetrically dis-tributed about 0 with finite inverse moment for allk, i,
• the damping rates of akandckare neither too fast nor too slow, and
• L(θ) (in this case the dissonance function) is sufficiently smooth (several times differentiable) nearθ∗(in this case the target frequencies).
The first condition is guaranteed ingoldenearby using a vector with a Bernoulli distribution called before as ‘Bernoulli vector,’ (when Sethares’ model was explained) a Bernoulli distribution being a discrete distribution having two possible outcomes,
‘success,’ and ‘failure,’ which occur with probability p and q = 1 − p respectively [50]. In the case ofSPSAthe outcome values are symmetric around 0 (±1):
f (n) =
withp = 0.5. The symmetry of this distribution statistically cancels the effect of the contributions induced by the perturbations in the measurements of the loss function.
Although practical and asymptotic expressions to calculate the damping rates are provided in [29], the dissonance problem doesn’t fit the canonical form ofSPSA, con-verting the tuning of such constants is a difficult problem. The values of these constants (A = 10.1, a = 0.5, and c = 0.01) were chosen by a trial and error process to satisfy the second condition.
The last condition devised by Spall for the convergence of the algorithm is not al-ways met by the implemented dissonance function, as can be observed in Figure 5.4.
The dissonance function has several discontinuities and at some points where the disso-nance is a minimum (like near the P5), although the function is continuous, the gradient cannot be calculated. Nevertheless, as shown in the next chapter,SPSAdelivery good approximations in general and whenever the frequencies calculated by the algorithm have a dissonance value greater than the dissonance of the original tones, the last ones are passed to the next module (i.e. no frequency changes are reported).
The vicinity in ¢ specified by the user is used to calculate the limits in Hz in which the iterated answers must fall in. goldenearrejects approximations falling beyond these limits and puts in their place the precalculated limit.
The dissonance is calculated three times for every iteration in the SPSA process:
two times by the heuristics of the algorithm and one extra calculation to monitor the dissonance of the partial solutions in each iteration. If it falls below a threshold (d=0.01), the current values for the fundamental frequencies are reported as answers and the rest of the iterations are skipped. This extra evaluation potentially saves some cycles in the iterative process, freeinggoldenear earlier for new incoming notes.
The threshold was chosen by observing the reported dissonance for a complex tone dyad and their respective fundamental frequencies values in Hz (several observations were conducted to cover much part of the audible frequency range).