Chapter 2: Research Methodology
2.13 Data Recording, Analysis and the Validation Process
this prior must also be dened when the number of notes in the mixture is not known.
The second assumption is that the spectral shape in the vicinity of a peak is important to the estimation of partial frequencies, whereas only the frequencies and sometimes the amplitudes of the partials are required for transcription. The spectral shape sometimes allows us to distinguish between merged harmonics of two or more notes. There are various cases where simply picking peaks of the spectrum above an adaptive noise oor is inadequate, and these cases are often the cause of transcription errors. The notes of chords in music often have overlapping harmonics, which may not be manifested as separate peaks but to the observer are obvious because of dierences in spectral shape. The spectral shape also helps distinguish between noise or artifacts in the signal and genuine partial frequencies, reducing spurious detections of partials which can lead to over or under-reporting of the number of notes playing. We will use an explicit signal model with a prior on the expected spectral shape of harmonic notes to accurately estimate partial frequencies.
We do not assume that the partial estimation procedure is perfect however, and therefore need a tran-scription system which is capable of dealing both with missed and duplicated partial detections. The solution we present in this chapter is to use an iterative algorithm based on the signal model presented in the previous chapter to provide high quality estimates of the partial frequencies, and to model the prior on the frequency estimates as a non-homogeneous Poisson process. Choosing to use a signal model rather than a heuristic estimation scheme for the partial frequency estimation is advantageous as present and future improvements to that model will also benet the estimation procedure here. However, it is also permissible to use other methods to estimate the partial frequencies, as was carried out previously using periodogram peak picking [Peeling et al., 2007b] and subspace methods [Peeling et al., 2007a]. In these cases, the prior on the frequen-cies needs to reect the estimation procedure, for example including a uniform clutter process across the frequency axis if many spurious partials are detected.
The structure of this chapter is as follows. In Section 6.2 we introduce the properties of non-homogeneous Poisson processes and how to calculate the likelihood given a set of observed frequencies. In Section 6.3 priors for harmonic models are discussed, and suggestions for how these priors should be modied for dierent partial estimation methods are given. In Section 6.4 a general method for making partial estimates from a signal model is presented. Transcription results for polyphonic mixtures of notes are presented in Section 6.5 and are compared with the previous chapter and prior work. Conclusions and suggestions for future research are given in Section 6.6.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
0 2 4 6 8 10
P(k)
k
Figure 6.1: Probability mass function for the Poisson distribution
estimates of multiple partials in a musical signal.
6.2.1 Frequency-Domain Process
We dene the number of partial estimates as a non-homogeneous Poisson process on the frequency axis.
Let N(f) be the number of partial estimates observed in the frequency range (0, f]. We assume that the number of partial estimates in a particular interval (a, b] of the frequency axis has a Poisson distribution with parameterλa,b. The number of partials is given byN(b)−N(a)and has probability distribution
P(N(b)−N(a) =k|λa,b) = exp (−λa,b) (λa,b)k
k! (6.1)
We interpretλa,b as the expected number of partials occurring in (a, b]. Figure 6.1 on page 82 shows the probability mass function for the Poisson distribution in (6.1) with λa,b = 1. We expect to observe one partial in the region(a, b]. This region could for instance be a DFT bin at a harmonic frequency of a musical note. The probability mass for observing zero and one partial in the region are equal whenλa,b= 1.
Under the assumptions of a Poisson process, we writeλa,b in terms of a continuous rate functionλ(f)
λa,b≡ ˆ b
a
λ(f) df
The rate function λ(f) of the Poisson process describes the expected concentration of partial frequencies along the frequency axis. For a harmonic musical note, we would expect the rate function to be large around the fundamental frequency and harmonics of the note, and small but non-zero elsewhere to allow for spurious
partial detections and transient eects.
For (6.1) to be valid for any values of aandb, there are two requirements. First, no two estimates may have exactly the same frequency. The signal model or partial estimation scheme used to observe partial positions should not provide estimates with exactly the same frequency, but that there must be a non-zero interval between successive frequencies. It is a property of the signal models we use in Section 6.4 that the two partials will never be estimated with exactly the same frequency, as this would lead to the basis functions being linearly dependent. Two basis functions with the same frequency may always be combined into a single basis function.
The second requirement is that the process is memoryless: the probability of a number of partials occurring in any region of the frequency axis must be independent of the occurrence of partials in any other region disjoint with that region1. This requires that λ(f) contains all of the prior information about the occurrence of partials. Modelling the occurrence of partials as a Poisson process makes the model robust to missing or duplicate partial detections. Harmonic models such as described in 5.4.2 require the existence of a single partial frequency in every harmonic position modelled, and therefore an entire note may not be detected due to a single missing partial frequency.
6.2.2 Superposition
One of the key attractions of using a Poisson process model to model partial estimates is that the observation of multiple Poisson processes superimposed on the same axis is also a Poisson process. Moreover, the rate function of the combined process is formed from the summation of the individual rate functions. Formally we have M Poisson processes N1(f), . . . , NM(f) with rate functions λ1(f), . . . , λM(f); and we observe N1:M(f) =PM
m=1Nm(f). Then
P
N1:M(b)−N1:M(a) =k|λ(1:Ma,b )
=
exp
−λ(1:M)a,b λ(1:M)a,b k
k!
λ(1:Ma,b ) = ˆ b
a M
X
m=1
λm(f) df (6.2)
Note that in observingN1:M(f)we lose labeling information, i.e., which Poisson processmeach partial was generated by. This makes the likelihood (6.2) easy to compute. Inferring the actual labels of the partials, for example in a source separation setting, cannot be carried out using the superimposed process alone, however the labels may also be inferred in a probabilistic manner using a likelihood function based on the individual rate functions for each note.
6.2.3 Evaluation of Likelihood
In this section we consider how to evaluate the likelihood of the occurrence of the entire set of observed partial positions. Although we would naturally try to calculate the likelihood exactly, the method we choose depends on how we observe the Poisson process. In this section, three methods are given for evaluating the likelihood. The exact method in 6.2.3.1 should be applied when a signal model is used to estimate the
1
partial frequencies. The binning method in 6.2.3.2 is suitable when a periodogram peak picking method is employed. If the peak picking method by design only detects zero or one peaks in each frequency bin, the calculation should be modied to allow for the possibility that more than one partial frequency was present in the bin. In this case, the method in 6.2.3.3 is appropriate.
6.2.3.1 Exact Calculation
When the partial estimates are known with sucient accuracy, and their frequencies are distinct, the likeli-hood of the occurrence of frequenciesf1, f2, . . . , fN under a non-homogeneous Poisson processλ(f)on the frequency axis between 0 and fs/2 where fs is the sampling frequency, is given by Crowder et al. [1991], Meeker and Escobar [1998] as
p(f1, f2, . . . , fN, N|λ(f)) = exp − ˆ fs/2
0
λ(f) df
! N Y
n=1
λ(fn) (6.3)
The derivation of the above likelihood is informally obtained rstly by noting that in the interval between observed frequency fn andfn+1 there are no observations. Hence, using (6.2) and substitutingk= 0, each such interval has probabilityexp −λfn,fn+1
= exp
−´fn+1
fn λ(f) df
. At each observed frequencyfn, the probability, using (6.2), of observingk= 1is given byλ(fn). We also take into account that no frequencies were observed in the interval [0, f1)and (fN, fs/2]. As a Poisson process requires that the observations in disjoint intervals of the frequency axis must be independent, we simply combine the probabilities of these observations together by multiplying them, thus:
p(f1, f2, . . . , fN, N|λ(f)) = exp − ˆ f1
0
λ(f) df
!
exp − ˆ fs/2
fN
λ(f) df
!
×
N−1
Y
n=1
exp − ˆ fn+1
fn
λ(f) df
!
×
N
Y
n=1
λ(fn)
= exp −
ˆ fs/2 0
λ(f) df
! N Y
n=1
λ(fn) 6.2.3.2 Binning
The likelihood when observations are grouped into non-overlapping regions (bins) of the frequency axis may be calculated as follows. Assume we haveF such bins, spanning frequency intervalsA1, . . . , AF , and denote the number of observations in each bin byNf. We then have, by the independence of intervals in a Poisson process,
P(N1, . . . , NF|λ1, . . . , λf) =
F
Y
f=1
P(Nf|λF) =exp (−λf) (λf)Nf
Nf! (6.4)
λf = ˆ
λ(f) df
The advantage of this method over the exact calculation method is that the rate functionλfmay be computed in advance for each binf before the partial frequencies are estimated, which reduces the computation required when evaluating the likelihood for multiple frames of music. Often the bins will coincide with the frequencies of the DFT used to estimate the partial frequencies.
6.2.3.3 Censored Frequencies
The partial estimation method used may only indicate that there is a partial in a frequency bin or not. An example is a single step peak picking scheme, which selects all the spectrum bins with amplitudes larger than neighbouring bins and above a noise threshold. It is possible that multiple frequencies are present within the region of the frequency axis covered by a single observation bin, for example in the case of overlapping harmonics. Although we have only `observed' at most one frequency per observation bin, we wish to allow for the possibility that more than one frequency could be present in each bin. This is useful in practice the rate function of the Poisson process is a superposition of the rate functions of harmonically related notes.
For every harmonic that overlaps within the region of a single bin, we would expect two or more partial frequencies to occur within that bin. Thus we are asserting that an observed peak in the spectrum implies the existence of multiple partial frequencies in that bin, and no observed peak implies that no partial frequencies were present in the bin.
For the observations to be valid as a Poisson process, when a peak is detected in a bin, we calculate the probability that one or more frequencies were observed in that bin, i.e.,p(Nf ≥1) = 1−p(Nf = 0) = 1−exp (−λf). When a peak is not detected in a bin, the probability is given byp(Nf = 0) = exp (−λf).
The likelihood over all the frequency bins is thus given by
F
Y
f=1
1−exp (−λf) peak observed in bin f
exp (−λf) no peak observed in binf (6.5)
The likelihood calculation in this case is the same as a set of Bernoulli trials with probability1−exp (−λf).