Query By Melody

(1)

1

Robust Temporal and Spectral Modeling for

Query By Melody

Shai Shalev, Hebrew University Yoram Singer, Hebrew University Nir Friedman, Hebrew University

Shlomo Dubnov, Ben-Gurion University

(2)

2

Prelude

(3)

3

Problem Setting

Database of real recordings Query: a melody

Find: performances of

the queried melody

(4)

4

Challenge

• Find performances of the queried melody independent of:

– Tempo

– Performing instrument – Dynamics

– Expression

– Accompaniment

(5)

5

Related Work

• A. Ghias, et al. “Query by humming”

• A. S. Durey and M. A. Clements. “Melody spotting using hidden markov models”

• C. Raphael. “Automatic segmentation of acoustic musical signals using HMMs”

• B. Doval and X. Rodet. “Fundamental frequency estimation using a new harmonic matching method”

(6)

6

Overview of Solution

• Employ a statistical framework

• Align a melody to a performance using an explicit tempo modeling

• Employ a maximum likelihood model for the spectrum of a note given the note’s pitch value

• Find the best alignment of a melody to a performance using dynamic programming

(7)

7

Statistical Framework

Query Engine

M)

| S P( 

_i

For each recording find:

A database of real recordings



^S^₁^,...,^S^_L



A melody query

 (d , p ),..., (d , p ) 

M 

₁ ₁ _k _k

Ranked list of



^S^₁^,...,^S^_L



According to

M)

|

S

P( 

_i

(8)

8

Melody Modeling



^

T T

M)) A(T,

| P(S P(T)

M)

| T , P(S

Hidden Variable Observed

Variable

Legend:

M) 

| P(S

M)) A(T,

| P(S

T P(T)

 max

Melody



(d₁,p₁),...,(d_k,p_k)



Tempo ) t (t₁,..., _k

Aligned Melody



(t₁d₁,p₁),...,(t_kd_k,p_k)



Sound

n 1

,..., s

s

(9)

9

Tempo Modeling

• Sequence of scaling factors (one per note)

• Model tempo as a first order Markov model



 

 ^k

2

i i i 1

1 k

1,...,T ) P(T ) P(T|T ) P(T

• Use log-normal distribution to model conditional probability of tempo

ρ) ), (log(T

~ ) T |

log(T Ν

(10)

10

Spectral Modeling

1st harmonic 2nd harmonic

3rd harmonic 4th harmonic

 ^h 

H h

h

-

0

A )

S(       

1

(11)

11

Spectral Modeling

) (

)

(

₀ ₀

)

F(   S   N 

0 500 1000 1500 2000 2500 3000 3500 4000 4500

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

 (Hz) F()

Noise Signal

(12)

12

Spectral Modeling (cont.)

0 500 1000 1500 2000 2500 3000 3500 4000 4500

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

 (Hz)

F()

Noise Signal

ω0

 -  ( )

A )

F(   

_h

   

₀

  



h

H h 1

(13)

13

Spectral Modeling ^(cont.)

• Estimate the amplitude at each harmony and global variance of the noise using the maximum likelihood principle

• Resulting signal-to-noise likelihood function:

 





 



 

₂

0

2 0 0

) N(

) log S(

))

| log(P(F



 

(14)

14

Finding the best

melody-performance alignment

• Recurse over tempo and end-time of the previous note

 Dynamic Programming procedure

• Complexity:

) M T

O(k

²

#notes Length of Signal

#Possible

Tempo values

(15)

15

• Queries: 50 melodies from opera arias (from Midi files)

• Database: over 800 performances of opera arias performed by over 50 tenors with full orchestral accompaniment

• Compared our variable-tempo (VT) model vs. fixed-tempo (FT) and locally-fixed-tempo (LFT) models

• Compared our Harmonic with Scaled Noise (HSN) spectral model vs. Harmonic with Independent Noise (HIN) model

Experimental Results

(16)

16

Evaluation Measures

Oerr = 0

Cov = 3 - 2

+ -

- - -

Likelihood Value

-

Index of Performance in the ranked list

1 2 3 4 5



 

 

 3

2 1

1 2 AvgP 1

(17)

17

Summary of Results

• One Error of VT+HSN: 8%

• Average Precision of VT+HSN: 95%

• Coverage of VT+HSN: 0.21

(18)

18

Results

0.75 21.67

0.35 0.69

22.96

FT 0.38

0.75 17.94

0.37 0.69

17.33

LFT 0.43

0.69 11.83

0.46 0.65

10.67

VT 0.51

5

Sec.

0.73 19.08

0.36 0.71

19.83

FT 0.38

0.42 8.15

0.66 0.44

8.10

LFT 0.66

0.19 3.02

0.83 0.19

1.75

VT 0.86

15

Sec.

0.79 22.46

0.33 0.77

20.69

FT 0.34

0.48 5.98

0.63 0.46

5.90

LFT 0.66

0.10 0.40

0.92 0.08

0.21

VT 0.95

25

Sec.

Oerr Cov

AvgP Oerr

Cov AvgP

HIN HSN

Spectral Distribution Model

(19)

19

Precision-Recall

0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

Precision

FT/25 LFT/25 VT/25

(20)

20

Illustration of Segmentation

(21)

21

Future Work

• More data

• Other genre of music

• Alternative spectral distribution models using supervised learning methods.

• Use alignment results for separating a soloist

from the accompaniment

Query By Melody

Robust Temporal and Spectral Modeling for

Query By Melody

Shai Shalev, Hebrew University Yoram Singer, Hebrew University Nir Friedman, Hebrew University

Shlomo Dubnov, Ben-Gurion University

Prelude

Problem Setting

Database of real recordings Query: a melody

Find: performances of

the queried melody

Challenge

• Find performances of the queried melody independent of:

– Tempo

– Performing instrument – Dynamics

– Expression

– Accompaniment

Related Work

Overview of Solution

Statistical Framework

M)

| S P( 





 (d , p ),..., (d , p ) 

M 





M)

|

S

P( 

Melody Modeling













,..., s

s

Tempo Modeling

• Sequence of scaling factors (one per note)

• Model tempo as a first order Markov model



• Use log-normal distribution to model conditional probability of tempo

Spectral Modeling

 h 

-

A )

S(       

Spectral Modeling

) (

)

(

)

F(   S   N 

Spectral Modeling (cont.)

 -  ( )

A )

F(   

   

  

h

Spectral Modeling (cont.)

• Estimate the amplitude at each harmony and global variance of the noise using the maximum likelihood principle

• Resulting signal-to-noise likelihood function:

 





 



 

) N(

) log S(

))

| log(P(F



 

Finding the best

melody-performance alignment

 ^h 

Spectral Modeling ^(cont.)