• No results found

Query By Melody

N/A
N/A
Protected

Academic year: 2022

Share "Query By Melody"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Robust Temporal and Spectral Modeling for

Query By Melody

Shai Shalev, Hebrew University Yoram Singer, Hebrew University Nir Friedman, Hebrew University

Shlomo Dubnov, Ben-Gurion University

(2)

2

Prelude

(3)

3

Problem Setting

Database of real recordings Query: a melody

Find: performances of

the queried melody

(4)

4

Challenge

• Find performances of the queried melody independent of:

Tempo

Performing instrument Dynamics

Expression

Accompaniment

(5)

5

Related Work

• A. Ghias, et al. “Query by humming”

• A. S. Durey and M. A. Clements. “Melody spotting using hidden markov models”

• C. Raphael. “Automatic segmentation of acoustic musical signals using HMMs”

• B. Doval and X. Rodet. “Fundamental frequency estimation using a new harmonic matching method”

(6)

6

Overview of Solution

• Employ a statistical framework

• Align a melody to a performance using an explicit tempo modeling

• Employ a maximum likelihood model for the spectrum of a note given the note’s pitch value

• Find the best alignment of a melody to a performance using dynamic programming

(7)

7

Statistical Framework

Query Engine

M)

| S P( 

i

For each recording find:

A database of real recordings

S1,...,SL

A melody query

 (d , p ),..., (d , p ) 

M 

1 1 k k

Ranked list of

S1,...,SL

According to

M)

|

S

P( 

i

(8)

8

Melody Modeling

T T

M)) A(T,

| P(S P(T)

M)

| T , P(S

Hidden Variable Observed

Variable

Legend:

M)

| P(S

M)) A(T,

| P(S

T P(T)

 max

Melody

(d1,p1),...,(dk,pk)

Tempo ) t (t1,..., k

Aligned Melody

(t1d1,p1),...,(tkdk,pk)

Sound

n 1

,..., s

s

(9)

9

Tempo Modeling

• Sequence of scaling factors (one per note)

• Model tempo as a first order Markov model

k

2

i i i 1

1 k

1,...,T ) P(T ) P(T|T ) P(T

• Use log-normal distribution to model conditional probability of tempo

ρ) ), (log(T

~ ) T |

log(T Ν

(10)

10

Spectral Modeling

1st harmonic 2nd harmonic

3rd harmonic 4th harmonic

h

H h

h

-

0

A )

S(       

1

(11)

11

Spectral Modeling

) (

)

(

0 0

)

F(   S   N

0 500 1000 1500 2000 2500 3000 3500 4000 4500

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(Hz) F()

Noise Signal

(12)

12

Spectral Modeling (cont.)

0 500 1000 1500 2000 2500 3000 3500 4000 4500

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(Hz)

F()

Noise Signal

ω0

 -  ( )

A )

F(   

h

   

0

  

h

H h 1

(13)

13

Spectral Modeling (cont.)

Estimate the amplitude at each harmony and global variance of the noise using the maximum likelihood principle

• Resulting signal-to-noise likelihood function:

 

 

 

2

0

2 0 0

) N(

) log S(

))

| log(P(F

 

(14)

14

Finding the best

melody-performance alignment

Recurse over tempo and end-time of the previous note

 Dynamic Programming procedure

Complexity:

) M T

O(k

2

#notes Length of Signal

#Possible

Tempo values

(15)

15

• Queries: 50 melodies from opera arias (from Midi files)

• Database: over 800 performances of opera arias performed by over 50 tenors with full orchestral accompaniment

• Compared our variable-tempo (VT) model vs. fixed-tempo (FT) and locally-fixed-tempo (LFT) models

• Compared our Harmonic with Scaled Noise (HSN) spectral model vs. Harmonic with Independent Noise (HIN) model

Experimental Results

(16)

16

Evaluation Measures

Oerr = 0

Cov = 3 - 2

+ -

+ -

- - -

Likelihood Value

-

Index of Performance in the ranked list

1 2 3 4 5



 

 

 3

2 1

1 2 AvgP 1

(17)

17

Summary of Results

One Error of VT+HSN: 8%

Average Precision of VT+HSN: 95%

Coverage of VT+HSN: 0.21

(18)

18

Results

0.75 21.67

0.35 0.69

22.96

FT 0.38

0.75 17.94

0.37 0.69

17.33

LFT 0.43

0.69 11.83

0.46 0.65

10.67

VT 0.51

5

Sec.

0.73 19.08

0.36 0.71

19.83

FT 0.38

0.42 8.15

0.66 0.44

8.10

LFT 0.66

0.19 3.02

0.83 0.19

1.75

VT 0.86

15

Sec.

0.79 22.46

0.33 0.77

20.69

FT 0.34

0.48 5.98

0.63 0.46

5.90

LFT 0.66

0.10 0.40

0.92 0.08

0.21

VT 0.95

25

Sec.

Oerr Cov

AvgP Oerr

Cov AvgP

HIN HSN

Spectral Distribution Model

(19)

19

Precision-Recall

0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

Precision

FT/25 LFT/25 VT/25

(20)

20

Illustration of Segmentation

(21)

21

Future Work

• More data

• Other genre of music

• Alternative spectral distribution models using supervised learning methods.

• Use alignment results for separating a soloist

from the accompaniment

References

Related documents

We studied intermetallic compound growth and failure mechanisms to determine long term reliability of 96% silver wire bonding. To determine the IMC microstructure and

ABSTRACT This paper discusses selected dispositions and characteristics of the modern liberal/Cartesian subject observed in students’ responses to a survey on internationalization

This present chapter touches upon the pro- and anti-Jesuitical sentiments that were present in the second half of the eighteenth century, eventually leading to the expulsion of

For example, given that a pointing gesture can ef- ficiently locate a target referent in a visual do- main, should an accompanying description avoid mentioning locative

musí mít písemnou formu a obsahovat zákonem stanovené údaje. Základní náležitosti, které musí projekt fúze obsahovat jsou uvedeny v ustanoveních §70 ZoPS. Tyto

These important threads included having knowledge of, and knowing the person and acting on this by, for example, a person's personal preferences being incorporated into

Table 1: Summary table comparing DSM-IV and DSM-5 diagnostic criteria for ED (highlighted are criteria which have been either changed or removed) ...25 Table 2:

We address suspension properties including the short-time translational and rotational self-diffusivities, the instantaneous sedimentation velocity, the wavenumber-dependent