_________________________________________________________
ISSC 2003, Limerick. July 1-2An Analytical Spectral formulation of Glottal Flow
Jacqueline Walker and
*Peter Murphy
Department of Electronic and Computer
Engineering,
University of Limerick, Limerick IRELAND
E-mail:Jacqueline.Walker@ul.ie
* Department of Electronic and Computer Engineering,
University of Limerick, Limerick IRELAND
E-mail: Peter.Murphy@ul.ie
__________________________________________________________________________________________ Abstract -- The need for accurate voice source characterisation
is an established goal in speech processing research. Practical limitations prohibit the widescale use of a glottal source/vocal tract filter implementation for many speech processing applications. In coding applications, for example, the transduction of the speech signal is with non-specialist microphones under diverse and often adverse conditions. In addition the transmission path and decoding process introduces further phase distortion. In the case of synthesis the accurate recording of a phase sensitive database is not overly problematic, however the extraction of the flow waveform from such a database is still a non-trivial task and as yet no automatic inverse filtering technique is readily available. One possible solution for overcoming the problem of extracting the timing events of the glottal flow is to implement a frequency domain representation and parameterization of the glottal flow waveform. An analytical spectral formulation of an existing time domain glottal model is presented.
Keywords – Glottal flow, spectral formulation.
__________________________________________________________________________________________
I
INTRODUCTION
An initial spectral representation of glottal flow was presented in [1]. A primary characteristic quoted from this work is that the glottal flow spectrum for modal voice register falls off at a rate of approximately –12dB/octave. A subsequent study [2] supports this general finding with an overall falloff of between –8 and –16 dB quoted. Further studies [3],[4],[5] have focussed on developing spectral properties that go beyond a description of the overall rate of decrease of the harmonic structure. In particular these studies relate specific events in time-domain glottal waveform models to specific characteristics of the resulting spectrum. In addition, [4] describes an approach to frequency domain inverse filtering. The paper also mentions the practical benefits of frequency domain processing, namely that it lessens the strict phase requirements of time-domain inverse filtering. This point is also made in [6]. More recent work has provided a quantitative parameter [7] and analytical expressions [8] for the glottal flow in the frequency domain. The difficulty in extracting timing events in the glottal flow waveform is noted in [7] and frequency domain processing is promoted as an alternative. Analytical frequency domain expressions for LF [9] and KLGLOTT88 [10] glottal models are
presented in [8]. The present study provides an analytical examination of a commonly used pulse formulated in [11].
II
THE GLOTTAL PULSE MODEL
AND ITS SPECTRUM
The glottal pulse waveform adopted for analysis is the discrete-time glottal pulse model of Rosenberg [11] given by
otherwise N N n N N N n N n N n n g , 0 ) ( , 2 ) ( cos 0 , cos 1 2 1 ) ( 1 1 2 2 1 1 1
(1)
where N1:N2 ~ 4:1 and the combined length of the open and closed phases of the glottal pulse is T=2(N1+N2)ts
where ts is the sampling interval. A plot of this waveform is in Figure 1(a). Figure 1(b) shows a plot of a continuous-time analog for the glottal pulse which may be created by letting n=t/ts and with T1=N1ts , T2=(N1+N2)ts.
, , 0 , ) ( 2 ) ( cos 0 , cos 1 2 1 ) ( 1 2 1 2 1 1 1
otherwise T t T T T T t T t T t t g
(2) 0 10 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Discrete-time Rosenberg Glottal Pulse Model
n g (n ) (a) 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Continuous-time analog of Rosenberg Glottal Pulse Model
t (secs)
g
(t
)
(b)
Figure 1: (a)Discrete-time Rosenberg model. (b) Continuous-time analog of Rosenberg model.
The Fourier Transform, G(f), of g(t) is given by
dt t e t g f G( ) () j2f 2 1 1 2 ) ( 2 ) ( 2 0 1 T T dt ft j e t g T dt ft j e t g
)
(
)
(
2 1f
G
f
G
(3)
For
0
t
T
1,
1 1 1 cos 2 1 ) ( T t t g
and itsFourier Transform is given by the Fourier Transform of the constant term minus the Fourier Transform of the cosine term, which is shown in Appendix A to be
)
(
)
(
sin(
)
(
(
1
sin(
4
)
sin(
2
)
(
1 1 ) ( 1 1 1 1 ) ( 1 1 1 1 1 1 1 1 1 1 1)
f
f
T
e
f
f
T
f
f
T
f
f
T
T
fT
fT
T
f
G
T f f j T f f j fT je
e
(4) ForT
1
t
T
2,
) ( 2 ) ( cos ) ( 1 2 1 2 T T T t t g
and it isshown in Appendix A that the Fourier Transform of this term is given by
) ( )) ( sin( 2 ) ( )) ( sin( 2 ) ( 2 ) )( ( 2 ) )( ( 2 2 2 2 1 2 2 1 2 f f T e e f f T T e e f f T f f T T f G T T f f j j T T f f j j (5)where T =T2-T1. Note, that g1(t) and g2(t) are limited in time, with Fourier Transforms, G1(f) and G2(f) that are both present for all frequencies. The main components of the spectrum of a single glottal pulse are components at the two frequencies f1= 1 2 1 T and f2 =
T
4
1
but these components are blurred and spread in the frequency domain by the sinc function. In addition there is a sinc function component centred at DC due to the constant term.This theoretical spectrum for a single glottal pulse can be plotted approximately using Matlab (Maths. Works, Cambridge, Mass.) where its value has been calculated for frequencies between –5000 Hz and +5000 Hz at an increment of 1 Hz. The plot is then interpolated using the Matlab plot function. The plot in Figure 2 below was
calculated for a single glottal pulse with T1 = 4 ms and T2 = 5ms, hence the frequencies, f1 and f2 are 125 Hz and 250 Hz respectively. As a result of the blurring by the sinc function, individual spectral peaks cannot be seen and the most significant feature of the spectrum is that it is low pass with most of the spectral content below 500 Hz (this is more obvious in a linear power plot). In fact we expect the overall spectral energy in the low frequency region to be a composite of DC and f1 and f2 peaks as shown.
0 500 1000 1500 2000 2500 10-6 10-5 10-4 10-3 10-2 10-1 100
Theoretical Spectrum of Single Glottal Pulse
Frequency (Hz) d B
Figure 2: Theoretical spectrum of a single glottal pulse.
a) Glottal Pulse Train Spectra
In order to generate a glottal pulse train, the glottal pulse is convolved with an impulse train of period T=Nts where N=2(N1+N2) resulting in the periodic repetition of the glottal pulse with that period. In the frequency domain, the continuous spectrum is ideally sampled according to the sampling theorem.
k k kF f F f G kT t t g( )
( ) ( )
( ) (6)where indicates convolution (not periodic convolution) and F=1/T.
The effect on the spectrum can be visualized as shown in Figure 3, where the plot for the single glottal pulse has been multiplied by F and sampled at kF.
Due to the frequency domain sampling imposed by the time domain convolution, impulses at multiples of F=100 Hz appear in the spectrum weighted by the value of the single glottal pulse spectrum at that point. 0 500 1000 1500 2000 2500 -6 -5 -4 -3 -2 -1 0
Spectrum of Infinite Glottal Pulse Train
Frequency (Hz)
d
B
Figure 3: Theoretical spectrum of an infinite glottal glottal pulse train.
The spectrum in Figure 3 is for a glottal pulse train of infinite duration. A more realistic glottal pulse train would consist of a finite number of periods and the truncation would produce effects in the frequency domain. In general, truncation to m periods of length T would be equivalent to multiplication by the gate function
mT mT t t h( ) /2 (7)which would mean convolution of the spectrum of the glottal pulse train with
fmT j
e
fmT
fmT
mT
f
H
sin(
)
)
(
(8)
The result would be a blurring of the spectrum as the sinc function is reproduced at each impulse location of the glottal pulse train spectrum. The theoretical spectrum has been predicted for the case of m=10 and the result, plotted using Matlab, is shown in Figure 4(a). Using the discrete-time glottal pulse model in equation (1), a simulation of a glottal pulse train of 10 pulses has also been made and a 1024 point FFT taken. As the sampling frequency of 10 kHz is an integer multiple of the pulse train repetition frequency of 100 Hz, a window is not used. The result is shown in Figure 4(b) where it may be compared with the predicted theoretical spectrum of Figure 4(a). To highlight the comparison with the discrete FFT plot in Figure 4(b), every component of the theoretical spectrum in Figure 4(a) has been plotted using the stem function in Matlab.
0 500 1000 1500 2000 2500 -7 -6 -5 -4 -3 -2 -1 0
Spectrum for Truncated Train of 10 Glottal Pulses
f (Hz) d B (a) 0 500 1000 1500 2000 2500 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100
1024 point FFT of 10 glottal pulses
Frequency (Hz) d B (b)
Figure 4: (a) Theoretical spectrum of truncated glottal pulse train; (b) FFT of truncated glottal pulse train.
III
DISCUSSION
Understandably, frequency domain analyses have focused on correlations with their time domain counterparts. Essentially we have followed such an approach in the present paper. The results from the analytical expressions are verified through taking the Fourier transform of the glottal pulse(s).
Following an analytical approach provides a frequency domain model which can be investigated with or without recourse to its time domain (open quotient, T1, T2 etc.) counterpart i.e. modeling in the
frequency domain can be achieved through direct manipulation of f1 and f2 in the analytical
expressions.
IV
CONCLUSIONS
An analytical expression for the Rosenberg glottal pulse (1) is presented. Frequency domain modeling is attractive because it lessens the strict phase requirements of time domain inverse filtering. Future work will investigate time domain correlates of the current model and will focus on direct spectral source characterization in the absence of time-domain pre-processing.
V
APPENDIX
Fourier Transform of the Glottal Pulse Waveform To find the Fourier Transform of the first part of the glottal pulse waveform,
1 1 1 cos 2 1 ) ( T t t g
,0
t
T
1,it is convenient to split it into the constant term and the cosinusoid term. The constant term represents ½ multiplied
by a shifted gate function,
1 1 12
/
)
(
T
T
t
t
h
.The Fourier Transform of this term is then given by
1 1 1 1 ) sin( 2 1 ) ( 2 1 jfT
e
fT fT T f H The Fourier Transform of the cosinusoid -½cos( 1
T
t
) t would be given by
1
1
4 1 f f f f
where f1= 1
2
1
T
. In forming the glottal pulse however, thisterm is also effectively multiplied by the shifted gate function, and the result is a convolution of the spectrum of the cosinusoid with the spectrum, H1(f), to give
1 1 1 1 1 1 1 1 1 1 1sin
1
1
sin
4
f
f
T
T
f
f
j
e
f
f
T
f
f
T
T
f
f
j
e
f
f
T
T
The overall contribution to the spectrum from the first term of the glottal pulse waveform is thus given by the sum of A(1) and A(3).
The second term of the glottal pulse waveform is
1 2 1 2 1 2 , 2 cos ) ( T t T T T T t t g
. LettingT
T
2
1
where ∆T=T2-T1, the Fourier Transform of
1 2 2 cos T T t t is given by
j
j
e
f
f
e
f
f
2 22
1
where Tf
4 12 . In forming the glottal pulse, however, this term is effectively multiplied by a shifted gate function,
T T T t th2() 1 2 /2 which has the Fourier Transform 1 2 ) sin( ) ( 2
e
j f T T T f T f T f H
convolving the term in A(4) with H2(f)
gives
2 2 2 1 2 2 1 2 2 sin 2 sin 2 f f T f f T f f T f f T T T T f f j j T T f f j je
e
e
e
VI
ACKNOWLEDGEMENTS
This work is supported by an Enterprise Ireland research grant ST/2000/100/A.
VII
REFERENCES
[1] J. L. Flanagan, “Some properties of the glottal sound source”, J. Speech and Hear. Res., 1: 99-111, 1958. [2] P. B. Carr and D. Trill, “Long term larynx excitation
spectra”, J. Acoust. Soc. Am., 36(11): 2033-2040, 1964.
[3] T. V. Ananthapadmanabha, “Acoustic analysis of voice source dynamics”, STL-QPSR, 2-3: 1-24, 1984. [4] G. Fant and Q. Lin, “Frequency domain interpretation
and derivation of glottal flow parameters”, STL-QPSR, 2-3: 1-21, 1998.
[5] J. Gauffin and J. Sundberg, “Spectral correlates of glottal source waveform characteristics”, J. Speech
Hear. Res., 32: 556-565, 1989.
[6] P. Ladefoged, I. Maddieson and M. Jackson,
“Investigating phonation types in different languages”, in Vocal Physiology: Voice Production, Mechanisms and Functions, edited by Osamu Fujimura, pp. 297-318, Raven Press, New York, 1988.
[7] P. Alku, H. Strik and E. Vilkman, “Parabolic spectral parameter – a new method for quantification of the glottal flow” Speech Commun., 22: 67-79, 1997. [8] B. Doval and C. d’Alessandro,“Spectral correlates of
glottal waveform models: An analytic study”, ICASSP, 1285-1298, 1997.
[9] G. Fant, J. Liljencrants and Q. Lin, “A four parameter model of glottal flow” STR-QPSR 4:1-13, 1985. [10] D. Klatt and L. Klatt, “Analysis, synthesis and
perception of voice quality variations among female and male talkers”, J. Acoust. Soc. Am., 87: 820-857, 1990.
[11] A. E. Rosenberg, “Effect of glottal pulse shape on the quality of natural vowels,” J. Acoust. Soc. Am., 84: 583-588, 1971.