Proposed alignment algorithm - Retrieval of melodic patterns in automatic transcriptions

3.2 Retrieval of melodic patterns in automatic transcriptions

3.2.2 Proposed alignment algorithm

In this example, the alignment path p results to

56 57 58 59 57

– 57 58 – 57

with score d_align = 2.0.

Fig. 3.3 Alignment of two sequences with the NW algorithm. Red cells form the alignment path. Arrows indicate transitions from successors when parsing the grid.

3.2.2 Proposed alignment algorithm

The problem addressed in this chapter cannot be treated as a global alignment task because our goal is to detect occurrences of a pattern in a significantly longer stream of notes. We are therefore proposing a modification of the NW algorithm, that preserves its fundamental characteristics and adds the capability to retrieve a ranked list of subsequences from an automatic transcription. Each retrieved result aligns, in some optimal sense, with the given prototype pattern. The novelty of our approach lies in the fact that it introduces a systematic way to:

(a) extract iteratively occurrences of the reference pattern, ranked with respect to similarity score

(c) ensure invariance to key changes because the alignment takes place on the sequences of intervals derived from the pitch sequences that are being matched

(d) formulate transition costs between nodes of the similarity grid as a function of intervalic differences

At a first stage, the proposed method operates on pitch sequences only, ignoring note durations. At a second stage, the results are refined by removing alignments that correspond to excessive local time-stretching. In the remainder of this chapter, we will use the abbreviation mNW for the proposed method.

In order to describemNW, let A := {a₁, a2, . . . , aMA} and Q := {q1, q2, . . . , qMQ} be the pitch sequences of the automatic transcription and the search pattern, respectively, where elements ai and qj are pitch values in some symbolic (MIDI-like) format. At this stage,

note durations are ignored. Sequence Q is manually defined and reflects our musicological knowledge of the pattern to be detected. For example, pattern “A” of our experimental setup (Section 3.3) is represented by the following sequence of MIDI values:

{64, 67, 65, 64, 67, 65, 65, 64, 62, 60, 58, 57} We now define that,

δQ(j2, j1) = qj2− qj1,

subject to 1 ≤ j1< j2 ≤ MQ, is the music interval formed between the j1-th and j2-th pitch value of the prototype pattern, which are not necessarily adjacent, and, similarly

δA(i2, i1) = ai2 − ai1,

subject to 1 ≤ i₁ < i2≤ MA, is the music interval formed between the i1-th and i2-th pitch value of the automatically generated transcription. The proposedmNW algorithm seeks a subsequence of A with increasing, but not necessarily adjacent indices, such that the resulting sequence of intervals matches in some optimal scoring sense, a sequence of intervals formed by a subsequence of p, also of increasing, but not necessarily adjacent index.

To solve this problem from a dynamic programming perspective, we place A on the vertical and Q on the horizontal axis and form an (MA+ 1)-by-(MQ+ 1) scoring grid D,

where the last row and column are initialised to zero.

As in the NW algorithm described above, we then proceed row-wise, decreasing the row index and examining the nodes of each row at decreasing column index, which stands for a standard zig-zag scanning procedure. The accumulated score, D(i, j), at node (i, j), where i < MAand j < MQ is computed as follows:

h = max

j+1≤k≤j+Gh

v = max

i+1≤m≤i+Gv

D(m, j + 1) + γ(δ_A(m, i), δ_Q(j + 1, j)) (3.4)

D(i, j) = max{h, v} (3.5)

where parameters G_h and Gv are positive integers that define the search radius for successors

on the horizontal and vertical axis, respectively, and function γ(.) is defined as:

γ(x, y) =          1, if x = y, −1, if | x − y |= 1, −∞, if | x − y |> 1, (3.6)

The first two equations impose that the best successor of node (i, j) resides either on the next row (the (i + 1)-th row) or on the next column (the (j + 1)-th column). Parameters Gh and Gv control the horizontal and vertical gap length, respectively. In other words, they

control how many pitch values can be skipped horizontally or vertically when searching for the best successor of the node.

Function γ rewards equal intervals with a score equal to +1, penalizes with −1 any pair of intervals that differ by one semitone and forbids intervalic differences larger than a semitone to take place, hence the −∞ penalty. An example is shown in Figure 3.4: The transition from D(i + 2, j + 1) to D(i, j) shown in (a) yields a score of γ = 1, because the musical interval between 60 and 63 in the transcription is equal to the interval between 64 and 67 in the query melody. The transition from D(i + 1, j + 1) to D(i, j) shown in (b) does however yield γ = −∞, since the interval in the automatic transcription (from MIDI note 60 to MIDI note 59) deviates more than one semitone from the corresponding interval in the query pattern (from MIDI note 64 to 67). In this way, the algorithm does not only compensate for key transposition, but furthermore considers the case when an interval is broken into several subintervals. For example, the note sequence succession {51, 54} in a query pattern might correspond to {52, 53, 55} in a performance transcription, where a note in the middle is inserted in form of a transitional grace note. However, the intervals from first to last note are identical in both sequences.

After a node has been processed, the coordinates of its best successor are again stored in a separate matrix, Ψ. Once the whole grid has been scanned, the highest accumulated score on the first E1 columns is selected and forward tracking on matrix Ψ reveals the best alignment path. However, this path will be rejected if it does not end in one of the last E2 columns of the grid. Therefore, parameters E₁ and E₂ stand for the endpoint constraints of the alignment procedure, i.e., we permit that at most E1− 1 and E2− 1 notes are omitted from the left and right endpoints of the prototype pattern, respectively. If a path is rejected, we repeat from the second highest score until a valid path is detected or until all nodes of the first E1 columns have been processed as candidate starting points of the best path. For the sake of completeness, pseudo-code for the proposed method is provided in Algorithm 3.2.

Fig. 3.4 Example of transitions to D(i, j). Orange cells denote the search range (Gv = Gh = 3).

Obviously, if we want the algorithm to return two pattern occurrences, the procedure will be repeated until a second path is revealed, and, of course, this can be readily extended to address any number of desired occurrences.

An example of the alignment process is shown in Figures 3.5 and 3.6 and Table 3.1. Figure 3.5 shows the detection of a query pattern in an automatic performance transcription. The corresponding similarity grid together with the extracted alignment path is depicted in Figure 3.6. Table 3.1 details the alignment result. It can be seen, that the transcription is significantly longer than the query pattern and the matching subsequence is located at the end of the stream. The resulting alignment shows that three notes are skipped in the transcription, and one note is skipped in the manually defined pattern.

In document Flamenco music information retrieval. (Page 64-67)