1
Robust Temporal and Spectral Modeling for
Query By Melody
Shai Shalev, Hebrew University Yoram Singer, Hebrew University Nir Friedman, Hebrew University
Shlomo Dubnov, Ben-Gurion University
2
Prelude
3
Problem Setting
Database of real recordings Query: a melody
Find: performances of
the queried melody
4
Challenge
• Find performances of the queried melody independent of:
– Tempo
– Performing instrument – Dynamics
– Expression
– Accompaniment
5
Related Work
• A. Ghias, et al. “Query by humming”
• A. S. Durey and M. A. Clements. “Melody spotting using hidden markov models”
• C. Raphael. “Automatic segmentation of acoustic musical signals using HMMs”
• B. Doval and X. Rodet. “Fundamental frequency estimation using a new harmonic matching method”
6
Overview of Solution
• Employ a statistical framework
• Align a melody to a performance using an explicit tempo modeling
• Employ a maximum likelihood model for the spectrum of a note given the note’s pitch value
• Find the best alignment of a melody to a performance using dynamic programming
7
Statistical Framework
Query Engine
M)
| S P(
iFor each recording find:
A database of real recordings
S1,...,SL
A melody query
(d , p ),..., (d , p )
M
1 1 k kRanked list of
S1,...,SL
According to
M)
|
S
P(
i8
Melody Modeling
T T
M)) A(T,
| P(S P(T)
M)
| T , P(S
Hidden Variable Observed
Variable
Legend:
M)
| P(S
M)) A(T,
| P(S
T P(T)
max
Melody
(d1,p1),...,(dk,pk)
Tempo ) t (t1,..., k
Aligned Melody
(t1d1,p1),...,(tkdk,pk)
Sound
n 1
,..., s
s
9
Tempo Modeling
• Sequence of scaling factors (one per note)
• Model tempo as a first order Markov model
k
2
i i i 1
1 k
1,...,T ) P(T ) P(T|T ) P(T
• Use log-normal distribution to model conditional probability of tempo
ρ) ), (log(T
~ ) T |
log(T Ν
10
Spectral Modeling
1st harmonic 2nd harmonic
3rd harmonic 4th harmonic
h
H h
h
-
0A )
S(
1
11
Spectral Modeling
) (
)
(
0 0)
F( S N
0 500 1000 1500 2000 2500 3000 3500 4000 4500
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(Hz) F()
Noise Signal
12
Spectral Modeling (cont.)
0 500 1000 1500 2000 2500 3000 3500 4000 4500
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(Hz)
F()
Noise Signal
ω0
- ( )
A )
F( h
0
h
H h 1
13
Spectral Modeling (cont.)
• Estimate the amplitude at each harmony and global variance of the noise using the maximum likelihood principle
• Resulting signal-to-noise likelihood function:
20
2 0 0
) N(
) log S(
))
| log(P(F
14
Finding the best
melody-performance alignment
• Recurse over tempo and end-time of the previous note
Dynamic Programming procedure
• Complexity:
) M T
O(k
2#notes Length of Signal
#Possible
Tempo values
15
• Queries: 50 melodies from opera arias (from Midi files)
• Database: over 800 performances of opera arias performed by over 50 tenors with full orchestral accompaniment
• Compared our variable-tempo (VT) model vs. fixed-tempo (FT) and locally-fixed-tempo (LFT) models
• Compared our Harmonic with Scaled Noise (HSN) spectral model vs. Harmonic with Independent Noise (HIN) model
Experimental Results
16
Evaluation Measures
Oerr = 0
Cov = 3 - 2
+ -
+ -
- - -
Likelihood Value
-
Index of Performance in the ranked list
1 2 3 4 5
3
2 1
1 2 AvgP 1
17
Summary of Results
• One Error of VT+HSN: 8%
• Average Precision of VT+HSN: 95%
• Coverage of VT+HSN: 0.21
18
Results
0.75 21.67
0.35 0.69
22.96
FT 0.38
0.75 17.94
0.37 0.69
17.33
LFT 0.43
0.69 11.83
0.46 0.65
10.67
VT 0.51
5
Sec.
0.73 19.08
0.36 0.71
19.83
FT 0.38
0.42 8.15
0.66 0.44
8.10
LFT 0.66
0.19 3.02
0.83 0.19
1.75
VT 0.86
15
Sec.
0.79 22.46
0.33 0.77
20.69
FT 0.34
0.48 5.98
0.63 0.46
5.90
LFT 0.66
0.10 0.40
0.92 0.08
0.21
VT 0.95
25
Sec.
Oerr Cov
AvgP Oerr
Cov AvgP
HIN HSN
Spectral Distribution Model
19
Precision-Recall
0.2 0.4 0.6 0.8 1
0 0.2 0.4 0.6 0.8 1
Precision
FT/25 LFT/25 VT/25
20
Illustration of Segmentation
21