Citation for the original published paper (version of record):

(1)

http://www.diva-portal.org

This is the published version of a paper published in Journal of Biomechanics.

Citation for the original published paper (version of record):

Schickhofer, L., Mihaescu, M. (2019)

Analysis of the aerodynamic sound of speech through static vocal tract models of various glottal shapes

Journal of Biomechanics

https://doi.org/10.1016/j.jbiomech.2019.109484

Access to the published version may require subscription.

N.B. When citing this work, cite the original published paper.

Permanent link to this version:

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-263772

(2)

Journal Pre-proofs

Analysis of the aerodynamic sound of speech through static vocal tract mod- els of various glottal shapes

Lukas Schickhofer, Mihai Mihaescu

PII: S0021-9290(19)30734-1

DOI:

https://doi.org/10.1016/j.jbiomech.2019.109484

Reference: BM 109484

To appear in:

Journal of Biomechanics

Accepted Date: 30 October 2019

Please cite this article as: L. Schickhofer, M. Mihaescu, Analysis of the aerodynamic sound of speech through static vocal tract models of various glottal shapes, Journal of Biomechanics (2019), doi: https://doi.org/10.1016/

j.jbiomech.2019.109484

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier Ltd.

(3)

Analysis of the aerodynamic sound of speech through static vocal tract models of various glottal shapes

Lukas Schickhofer^a,∗, Mihai Mihaescu^a,∗

aDepartment of Mechanics, Linn´e FLOW Centre, KTH Royal Institute of Technology, Stockholm, SE-10044, Sweden

Abstract

The acoustic spectrum of our voice can be divided into harmonic and inharmonic sound components. While the harmonic components, generated by the oscillatory motion of the vocal folds, are well described by reduced-order speech models, the accurate computation of the inharmonic components requires high- order flow simulations, which predict the vortex shedding and turbulent structures present in the shear layers of the glottal jet. This study characterizes the dominant frequencies in the unsteady flow of the intra- and supraglottal region.

A realistic vocal tract geometry obtained through magnetic resonance imaging (MRI) is applied for the numerical domain, which is locally modified to account for different convergent and divergent glottal angles. Both time-averaged and fluctuating values of the flow variables are computed and their distribution at various glottal shapes is compared. The impact of the registered modes in the unsteady flow on the acoustic far field is computed through direct compressible flow simulations. Furthermore, acoustic analogies are applied to localize the sources of the aerodynamically generated sound.

Keywords: Human Phonation, Large Eddy Simulation, Aeroacoustics, Magnetic Resonance Imaging

2010 MSC: 00-01, 99-00

∗Corresponding authors:

Email addresses: [email protected] (Lukas Schickhofer), [email protected] (Mihai Mihaescu)

(4)

1. Introduction

We initiate speech through applying an increased lung pressure on the ad- ducted vocal folds. Thus, a periodic oscillation of the vocal folds is excited, which leads to a modulation of the volumetric flow rate. The propagation of the resulting pressure waves through the vocal tract further modifies the sound

5

spectrum in such a way that dominant peaks, the formants, appear in its enve- lope (Stevens, 2000; Titze and Alipour, 2006). However, above approximately 1–2 kHz an additional sound generation mechanism becomes important: The glottal jet produced through the opening between the vocal folds, the glottis, leads to the formation of flow instabilities in its shear layers and vortex shedding

10

downstream of the glottal exit (Zhang et al., 2004; Zhang and Mongeau, 2006).

The resulting flow structures and convective disturbances propagate at half of the mean glottal jet velocity (Schickhofer et al., 2019).

Through stroboscopic observations, Hirano (1981) identified approximate glottal shapes and angles during phonation, which are repeated periodically

15

during each cycle. Depending on the particular form of the vocal folds during their motion, the glottal constriction is generally convergent during the opening phase of the glottal cycle and divergent during the closing phase, leading to different form and behavior of the resulting glottal jet (Titze and Alipour, 2006).

In voiced speech with its periodic vocal fold oscillations, near-field distur-

20

bances occur particularly in the supraglottal region. Experimental studies found the supraglottal coherent structures and vortex shedding to occur predominantly at frequencies of above 1000 Hz (Zhang et al., 2004; Kucinschi et al., 2006). Mi- haescu et al. (2010) found that at a divergent shape of the glottis with a glottal angle of 20^◦, vortices are shed already in the intraglottal region at dominant fre-

25

quencies of roughly 100-200 Hz. Further downstream of the glottal exit, these frequencies were then overshadowed by the aforementioned supraglottal flow structures of around 1000 Hz. Zheng et al. (2011) and Kniesburges et al. (2013) could show through numerical and experimental studies that the orientation of the glottal jet and the asymmetric deflection of the airflow through the glottis is

30

(5)

largely influenced by large-scale supraglottal flow structures arising particularly during the closing phase of phonation.

Furthermore, two-dimensional, simplified laryngeal geometries tend to over- predict vorticity in the flow and only detailed, three-dimensional models are correctly estimating the occurrence of supraglottal flow structures (Mattheus

35

and Br¨ucker, 2011). Moreover, Blandin et al. (2015) and Arnela et al. (2016) could recently show that vocal tract simplifications fail to accurately predict the behavior of higher-order cross-modes of frequencies above 4 kHz and only plane-wave propagation is accurately captured in such cases. This demonstrates the necessity of realistic airway geometries for acoustic simulations.

40

However, the effect of intra- and supraglottal flow structures, as well as flow turbulence on acoustics and general voice perception is still an open question (Mittal et al., 2013; Zhang, 2016). To answer this, acoustic analogies have been applied to computational laryngeal models with the goal to identify and localize the sources of acoustic waves. Furthermore, the quasisteady approximation of

45

acoustics has been applied to phonation to quantify the contribution of the various source types. It considers the laryngeal geometry at discrete instances of the glottal cycle and thereby circumvents the complex modeling problem of self-sustained vocal fold motion.

The quasisteady approximation holds, if the convective, spatial acceleration

50

due to airway narrowing is considerably larger than the unsteady acceleration due to time variations in the flow. Following Hirschberg (1992), the ratio of the unsteady to the convective acceleration of glottal flow has the order of magnitude of the Strouhal number St = (f₀L/U ) ∼ O(10⁻²) with the vibration frequency f₀, the glottal length L, and the velocity U at the minimum glottal area. It

55

is therefore considered a reasonable assumption that time variations of glottal flow during most part of the phonatory cycle are negligible. Indeed, Zhang et al.

(2002) and Vilain et al. (2004) experimentally verified the quasisteady approach for unsteady jets in confined, ducted geometries as it is the case for the sound sources of phonation. Furthermore, Park and Mongeau (2007) concluded that

60

the quasisteady assumption is valid for roughly 70% of the entire glottal cycle,

(6)

with exceptions only of very small diameters approaching opening or closing.

As noted by Krane and Wei (2006), a temporal regime exists during flow initia- tion and shutoff, at the beginning and end of the cycle, during which unsteady acceleration dominates. Subsequently, Krane et al. (2010) found through ex-

65

periments that the quasisteady approach holds for the middle interval of the glottal cycle, but might fail during the highly unsteady initial and end phase, in agreement with Park and Mongeau (2007).

Using the quasisteady approximation, Zhao et al. (2002) found for a simplified laryngeal geometry sources of dipole-type to be the dominant mechanism of

70

sound production caused by the periodic modulation of the vocal folds. Howe and McGowan (2007) characterized this dipole source through small vortical structures impinging on the glottal airway walls. Furthermore, Zhang et al.

(2004) studied broadband sound generation experimentally through glottal- shaped orifices with relevance to phonation and identified strong quadrupole

75

sources along the shear layers of the glottal jet. More recently, Kaltenbacher et al. (2014) and ˇSidlof et al. (2015) applied acoustic analogies based on both the formulation by Lighthill (1952) and on acoustic perturbation equations by Ewert and Schr¨oder (2003) to the unsteady flow field of a simplified computational laryngeal model. By computing the acoustic source terms of the wave

80

equations, they were able to identify the sources of sound at frequencies of up to several kilohertz. Furthermore, Lodermeyer et al. (2018) applied an acoustic analogy on particle image velocimetry (PIV) data in the wake of a synthetic larynx model including self-oscillating vocal folds. The sources of both harmonic, tonal sound, as well as of inharmonic, broadband sound of frequencies up to

85

1000 Hz could be localized.

The goal of this study is the identification of flow-induced sound during voiced speech both in terms of its frequency content, as well as its acoustic sources. Therefore, the unsteady flow inside a realistic vocal tract model is computed for various shapes of the vocal folds during the phonatory cycle. As

90

opposed to previous studies, a detailed, three-dimensional surface representa- tion of the internal upper airways involved in the speech production process is

(7)

applied for the generation of the computational grid. This way, it is ensured that the resulting source distribution of aerodynamically generated sound is representative of the scenario during voiced speech in an actual vocal tract.

95

2. Methods

2.1. Numerical method

1 2

1

2 far- ield probe

(a) (b)

(c)

Figure 1: Geometry and mesh of the vocal tract configuration of vowel [A] used in this study.

The surface mesh obtained through segmentation of the MRI data is shown in (a), while the coronal and sagittal plane sections are indicated in (b). The coronal plane is slightly tilted to allow for better examination of the supraglottal flow field. The mesh refinement stages along those planes are shown in (c) for both plane sections indicated in (b).

The flow field and resulting sound pressures are computed in this study by the use of large eddy simulations (LES). The behavior of the airflow is defined by

(8)

the compressible form of the continuity equation for mass conservation and the

100

Navier-Stokes equation for momentum conservation. The discretization of these flow-governing equations is performed using the finite volume method (FVM) applied to the numerical grid outlined in Sec. 2.2. Velocity and pressure of the flow are calculated in a coupled manner. Spatial discretization of the deriva- tives is achieved through a second-order bounded central-difference scheme. An

105

implicit scheme with temporal discretization of second order is applied for the integration in time. Convergence of the solution is ensured at every time step after 5 inner iterations. The time step of the numerical calculations is chosen as

∆tmax= 1.92×10⁻⁵s (∆tmax≈ 20 µs). The simulations are run for a maximum of 5000 time steps in all cases. The values of the time step and the duration

110

of the computations have been established as sufficiently small to capture all frequencies relevant for human voice production in the audible frequency range of 20-20000 Hz.

All scales smaller than the applied numerical mesh are modeled with the wall- adapting local eddy-viscosity (WALE) subgrid scale formulation that establishes

115

the subgrid scale viscosity as a function of the grid filter width, the density, and the resolved velocity field (Nicoud and Ducros, 1999). The ideal gas law p = ρRT , relating gas pressure p and density ρ via the specific gas constant of air R = 287.06 J/(kgK) at room temperature T ≈ 300 K, is used in the simulations as equation of state (EOS) for the compressible airflow. The computations are

120

carried out using the solver Star-CCM+ 13.04.011 (Siemens PLM Software).

2.2. Geometry and mesh

The applied surface geometry has been obtained through magnetic resonance imaging (MRI) data of a healthy male subject vocalizing the vowel [A] (cf.

Fig. 1). The measurements and segmentation were performed by the Speech

125

Modeling Group at the Department of Mathematics and Systems Analysis at Aalto University, Finland. Details on the measurement process with applied scanner settings and experimental arrangements are further described in recent publications (Aalto et al., 2011, 2014; Schickhofer et al., 2019). Due to the MRI

(9)

Rin

Rout

T

45°

-20°

-10°

-5°

y

0 x

Rin

Rout

T

45°

20°

10°

5°

y

0 x d/2

d/2

(a)

(b)

Figure 2: Convergent (a) and divergent glottal shapes (b) following the surface design equations of the M5 model introduced by Scherer et al. (2001) and Li et al. (2006).

(10)

recording temporally averaging the oscillatory motion of the true vocal folds,

130

their shape is adjusted a posteriori through surface mesh modification. For this purpose, the glottal surface design equations introduced by Scherer et al. (2001) and describing the dimensions of the commonly applied M5 model of the vocal folds are applied. They have been subsequently modified by Li et al. (2006) to account for various glottal angles of the vocal folds in convergent and divergent

135

state of a physical vocal fold model. These design parameters and equations are derived from average values of human vocal folds. In the current study, glottal angles of φ = 10^◦, 20^◦, 40^◦, relating to the half-angles of φ/2 = 5^◦, 10^◦, 20^◦ at a glottal width of d ≈ 6 × 10⁻⁴m (0.6 mm) in Fig. 2, are investigated. Based on the findings of Hirano (1981) and on the numerical model of Alipour and

140

Scherer (2004), the glottal angles are predominantly small during the majority of the glottal cycle and the half-angles barely exceed 20^◦. Thus, the full angles φ = 10^◦, 20^◦ considered here are rather small and lie within the middle part of the phonatory cycle, such that the quasisteady assumption holds and the ranges of large unsteady acceleration immediately after opening and just before closing

145

are avoided (Krane and Wei, 2006; Krane et al., 2010). The angle of φ = 40^◦ could potentially occur at an intermediate stage of the flow shutoff near vocal fold closure. However, as proposed by Deverge et al. (2003), the viscous effects in this range of shutoff exceed the effects due to flow unsteadiness, such that the quasisteady approach is found to provide a reasonable approximation also

150

at this stage.

Apart from the intraglottal shape, all other details of the original MRI model reported in Aalto et al. (2014) and Schickhofer et al. (2019), such as ventricular folds or epiglottis, have been left unchanged. Additionally, no further changes have been applied to the anterior-posterior direction, leading to the same cross-

155

sectional profile as in the reference studies by Scherer et al. (2001) and Li et al.

(2006).

The numerical finite volume mesh of this study is three-dimensional, un- structured, and consists of polyhedral cells, which are locally refined towards the glottal constriction, as shown in Fig. 1. The applied numerical grids consist

160

(11)

of about 7.5 × 10⁶ cells with an average cell size in the far field of 8 × 10⁻⁴m (0.8 mm). The vocal tract walls surrounding the airways are assumed static and rigid. A Dirichlet boundary condition, ui = 0, is applied for the components of the flow velocity at the walls. A stagnation inlet boundary condition, with an imposed constant lung pressure of P₀= P_s≈ 588.399 Pa (6 cmH2O) is cho-

165

sen for the subglottal end of the airways, while an acoustically non-reflecting pressure outlet at atmospheric pressure is applied to the far-field end of the domain.

2.3. Validation

The chosen numerical solver setup (i.e. cell size distribution, numerical

170

schemes, under-relaxation factors, turbulence treatment) has been validated in a previous study in terms of the computed surface and acoustic pressures resulting from the compressible flow solution (Schickhofer et al., 2019). In this work, the vocal tract geometries of several vowels were investigated, among which was also the geometry of vowel [A] applied in the current study. However, in or-

175

der to validate the setup also against the localized changes of the transglottal geometry and transglottal angles, which are considered in the current setup, a direct comparison to the experimental midline pressure data from the study of Li et al. (2006) has been performed using the same dimensions and boundary conditions. The results can be seen in Fig. 3. There is an overall good

180

agreement of the pressure values both for convergent and divergent shapes with only small deviations away from entrance and exit of the glottis. This could be due to the upstream and downstream region of the considered geometry being based on realistic MRI data and not on an idealized, simplified model as in the measurements of Li et al. (2006).

185

2.4. Computation of acoustic sources

The sources of sound in the supraglottal region can be identified by application of an analogy between the aerodynamic fluctuations in the near field

(12)

(a)

(b)

Figure 3: Transglottal midline pressure distribution based on the numerical data obtained through the current study and the experimental data from the study of Li et al. (2006) (1 cmH2O = 98.0665 Pa).

(13)

and the signal in the acoustic far field. Far field in this context means the domain sufficiently far away from the aerodynamic source region for any pressure

190

fluctuations to be considered purely acoustic and propagating at the speed of sound.

Ffowcs Williams-Hawkings equation

According to Lighthill (1952), the compressible Navier-Stokes equations, governing the motion of fluids and their velocity and pressure fields, can be

195

reformulated to a type of wave equation. Thus, a connection is drawn between the near-field fluctuations in a fluid and the acoustic wave propagation as a result of it. Consequently, Williams and Hawkings (1969) offered a variation to Lighthill’s equation by explicitly taking into account surfaces inside the source region. The Ffowcs Williams-Hawkings equation

200

∂²

∂t² − c²₀∇²

(ρ − ρ₀) = ∂²Tij

∂x_i∂x_j (I)

− ∂

∂xi

(p − p₀) δ_ijδ (f ) ∂f

∂xj

(II)

+∂

∂t

ρ0uiδ (f ) ∂f

∂xi

(III) (1) consists of the wave equation operator (∂²/∂t²) − c²₀∇² applied on the density fluctuations ρ⁰= ρ − ρ0. The speed of sound is denoted as c0, the mean pressure as p0, and the mean density of the fluid as ρ0. The Lighthill stress tensor,

Tij= ρuiuj+(p − p0) − c²₀(ρ − ρ0) δij− σij, (2) with the Reynolds stress ρu_iu_j, the viscous stress σ_ij and the contribution of the heat conduction (p⁰−c²₀ρ⁰)δ_ij, can be considered as a distribution of acoustic

205

sources in the near field. Eq. (1) introduces an explicit second term (II) on the right-hand side that accounts for unsteady surface loading by the fluid (its integral solution giving a dipole source) and a third term (III) that gives the pressure fluctuations due to accelerated wall motion (its integral solution giving a monopole source). The third and last term (III) on the right-hand side is

210

rather small compared to the other terms for the case of sound generation during

(14)

phonation and is therefore neglected in this study (Zhao et al., 2002; Howe and McGowan, 2007).

Lamb vector divergence

The Lamb vector is defined as the cross-product of vorticity and velocity of

215

a flow,

L = ~~ ω × ~u = (∇ × ~u) × ~u. (3) It can be considered as a vorticity-velocity interaction force. Its divergence

∇ · ~L = ~u · ∇ × ~ω − ~ω · ~ω (4) is the sum of the flexion product ~u · ∇ × ~ω and the negative enstrophy −~ω · ~ω.

Negative values of ∇·~L can be interpreted as spatially localized motions that lead to a time rate of change of linear momentum in the flow (Hamman et al., 2008).

220

Howe (1975) could show that the double-derivative of the Lighthill tensor, and therefore the acoustic source distribution can be approximated by the divergence of the Lamb vector,

∂²T_ij

∂xi∂xj

≈ ρ∇ · ~L = ρ∇ · [(∇ × ~u) × ~u] . (5) Consequently, McGowan (1988) used this approach for the purpose of aeroacoustics of phonation and applied it to the internal glottal flow for an acoustically

225

compact region. In this study, this approximation is applied to investigate the localized sources in the supraglottal region.

3. Results

3.1. Flow fluctuations

The flow field is shown for the cases of a convergent angle of -10^◦ and a

230

divergent angle of 10^◦ in Fig. 4. A jet of long potential core well into the supraglottal region is established for the convergent case due to focusing of the airflow to a smaller diameter of large mean velocity, also referred to as vena contracta effect. This leads to high shear between the jet and the resting

(15)

fluid, potentially increasing the fluctuations in the jet shear layers, especially

235

for large convergent angles. In the divergent case on the other hand, the glottal constriction diffuses the jet flow, leading to earlier separation already in the intraglottal domain.

P 3

P 2 L 1 L 2 L 3

L 4 L 5

P 1

-10° 10°

Sagittal

Coronal

(a) (b)

Figure 4: Instantaneous velocity magnitude of the intra- and supraglottal region for the - 10^◦–convergent and 10^◦–divergent case in mid-sagittal and transversal (a), as well as coronal plane sections (b). The positions of the line probes L 1–L 5 and the point probes P 1–P 3 are indicated.

Fig. 5 shows that there is an overall higher mean velocity of the glottal jet at convergent angles than divergent ones, with an additional slight increase to-

240

wards higher convergent angles. This is particularly apparent at positions in the vicinity of the glottal exit, such as L 1 and L 2, where the time-averaged velocity is up to three times higher for convergent than for divergent configurations.

Additional asymmetries in the flow are detected both in the mean (cf. Fig. 5) and fluctuating parts of velocity (cf. Fig. 6) for probes L 1 and L 2. These are

245

caused by a deflection of the glottal jet triggered by instabilities, making the flow attach to either side of the channel. As demonstrated by Fig. 6, the root- mean-square (RMS) of velocity at position L 1 close to the intraglottal domain is higher for divergent angles due to the aforementioned early flow separation and intraglottal vortex shedding. However, the strong shear layers occurring

250

at convergent configurations are introducing higher amplitudes of fluctuations further downstream at sections L 2–L 5.

The generation of flow-induced sound is attributed to the near-field fluctu-

(16)

0 20 40 -4

-3 -2 -1 0 1 2 3 4 10^-3

0 20 40

-4 -3 -2 -1 0 1 2 3 4 10^-3

L 1

L 1 L 2

L 2

L 3

L 4

L 5 L 3

L 4

L 5

(a)

(b)

Figure 5: Time-averaged velocity profiles in the supraglottal region for the convergent (a) and divergent cases (b). The line probe positions L 1–L 5 are shown in Fig. 4.

(17)

0 5 10 15 -4

-3 -2 -1 0 1 2 3 4 10^-3

0 5 10 15

-4 -3 -2 -1 0 1 2 3 4 10^-3

L 1

L 1 L 2

L 2

L 3

L 4

L 5 L 3

L 4

L 5

(a)

(b)

Figure 6: Velocity root-mean-square (RMS) profiles in the supraglottal region for the convergent (a) and divergent cases (b) at line probe positions L 1–L 5 (cf. Fig. 4).

(18)

ations in the flow, which could occur due to regular vorticity production, but also due to turbulent structures at different scales. Fig. 7 shows the turbulence

255

kinetic energy (TKE) at various glottal configurations. Higher values are detected in the coronal plane at high convergent angles, as well as in the center of the supraglottal region in the sagittal plane. Larger amplitude of fluctua-

-10°

10°

-20°

20°

-40°

40°

(a) (b) (c)

(d) (e) (f)

Figure 7: Turbulence kinetic energy (TKE) in mid-sagittal and coronal plane sections for the convergent cases of -10^◦(a), -20^◦(b), -40^◦(c), as well as the divergent cases of 10^◦(d), 20^◦ (e), 40^◦(f).

tions and thus higher kinetic energy are occurring due to the glottal jet shear layers introduced by the vena contracta effect at convergent angles. At diver-

260

gent constrictions, the TKE rises significantly at larger angles of 20^◦ and 40^◦ (cf. Fig. 7(e)-(f)) as compared to the 10^◦ case (cf. Fig. 7(d)). Most impor- tantly, it peaks close to the intraglottal domain. This is a crucial difference to the TKE distribution of convergent glottal shapes, which shows its maximum in the supraglottal region. For the 40^◦ divergent angle the maximum TKE is

265

(19)

about the same as for the convergent configurations. The focused glottal jet at convergent angles promotes flow fluctuations mostly located in the center of the epilaryngeal domain. On the contrary, the diffusion at divergent angles leads to a higher distribution close to the walls, in particular in anterior direction.

3.2. Broadband sound generation

270

The inharmonic frequency distribution of voiced speech is investigated by the use of Fourier spectra of the time series of streamwise axial velocity and pressure fluctuations extracted at various point probes in the near field. The spectral content of velocity fluctuations in the intraglottal constriction can be seen for divergent cases in Fig. 8. Peaks of low frequency are clearly visi-

275

ble at 118 Hz and 177 Hz, similar to the observations made by Mihaescu et al.

(2010) for a simplified glottal geometry. However, these dominant frequencies are considerably smaller in amplitude for the smaller divergent glottal angles of ϕ = 10^◦, 20^◦. This becomes obvious when considering the double-logarithmic plot of Fig. 8(b): The produced energy through the shedding of vortical struc-

280

tures in the intraglottal region steadily increases from 10^◦ to 40^◦. The power spectral density down to the frequencies associated with the Taylor microscale in the inertial subrange thus depends on the glottal angle.

The velocity spectra of the supraglottal flow at position P 2 for all investigated cases are shown in Fig. 9(a)-(b). Both convergent and divergent angles

285

lead to a considerable power spectral density across large parts of the spectrum up to 5 kHz. In the convergent cases, however, the most energy is contained in frequencies of up to 2 kHz with the highest peaks below 1 kHz.

Both convergent and divergent angles show clear peaks at frequencies around 430 Hz, as well as 830 Hz. These modes are most likely due to supraglottal

290

vortex shedding, since they are visible in the spectra at P 2, but not at P 1 in the intraglottal region. Furthermore, the spectrum in Fig. 9(b) contains the frequency peaks attributed to intraglottal vortical structures (cf. Fig. 8(a)).

The spectrum of pressure fluctuations in Fig. 9(c)-(d) shows the broadband character of disturbances above approximately 2500 Hz. It also reveals a domi-

295

(20)

10² 10³ 10⁴ 10^-8

10^-6 10^-4 10^-2 10⁰

0 1000 2000 3000 4000 5000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

(a)

(b)

Figure 8: Power spectral density of axial velocity in the intraglottal region at probe P 1 for the cases of divergent glottal angles (cf. ϕ/2 = 5^◦, 10^◦, 20^◦in Fig. 2(b)). The double-logarithmic plot of the spectrum in (a) is shown in (b).

(21)

0 1000 2000 3000 4000 5000 0

0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018

0 1000 2000 3000 4000 5000

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018

0 1000 2000 3000 4000 5000

0 0.05 0.1 0.15 0.2 0.25

0 1000 2000 3000 4000 5000

0 0.005 0.01 0.015 0.02 0.025

10² 10³ 10⁴

10^-10 10^-5 10⁰

10² 10³ 10⁴

10^-10 10^-5 10⁰

(a) (b)

(c) (d)

Figure 9: Power spectral density of axial velocity in the supraglottal region for the convergent and divergent cases at location P 2 (a)-(b) and double-logarithmic plot of the spectrum of pressure fluctuations at P 3 (c)-(d) (cf. |ϕ/2| = 5^◦, 10^◦, 20^◦in Fig. 2). The probe positions are depicted in Fig. 4.

(22)

nant peak at about 2367 Hz for the divergent shapes in Fig. 9(d), which relates to a Strouhal number of St = f · d/U ≈ 0.13, at the given glottal width d for the glottal velocity U of the time-averaged profiles depicted in Fig. 5. Mihaescu et al. (2010) attributed frequencies of this magnitude to vortex shedding arising due to shear layer instabilities in the supraglottal region. Similar results have

300

been reported both in an experimental study by Kucinschi et al. (2006) and in a numerical study by de Luzan et al. (2015) using simplified glottal geometries and leading to Strouhal numbers of 0.12-0.145.

3.3. Sound source distribution

The distribution of sound sources in the flow is computed in the near field

305

through the Lamb vector divergence, which is introduced in Sec. 2.3. Fig. 10 shows its spatial distribution in the coronal plane section. The Lamb vector divergence marks a rate of change of momentum from areas of negative values to the ones of positive values. Acoustic sources of aerodynamic sound appear to be stronger in the convergent cases of the opening phase of the glottal cycle, with

310

an additional increase towards higher angles. Fig. 11 gives the acoustic sources in the flow in form of isosurfaces together with the surface source distribution at frequency ranges of 1.0–1.5 kHz and 1.5–2.0 kHz. These frequency bands are chosen since they contain the majority of energy of aerodynamic fluctuations through flow instabilities in the supraglottal region, as demonstrated in the

315

spectra of Fig. 9. The contribution through the Lighthill source term of Eq.

(1), giving the sources through unsteadiness in the flow, is estimated through the Lamb vector divergence. The surface pressure fluctuations on the other hand, emitting sound through dipole-type sources, are shown by discrete Fourier transformation. These surface pressure fluctuations are the result of the integral

320

solution of term (II) in Eq. (1). Both source types are more prominent for the convergent configuration and the considered frequencies in Fig. 11, with an additional increase towards higher frequencies in the range of 1.5–2.0 kHz.

Furthermore, surface sources are located predominantly at the anterior walls for the divergent case, and more evenly distributed across the epilaryngeal walls in

325

(23)

the convergent case.

-10°

10°

-20°

20°

-40°

40°

(a) (b) (c)

(d) (e) (f)

Figure 10: Lamb vector divergence in the coronal plane ((a)-(c): -10^◦, -20^◦, -40^◦convergent, (d)-(f): 10^◦, 20^◦, 40^◦divergent). The Lamb vector divergence is normalized by its maximum absolute value.

3.4. Sound amplitudes

Finally, the average values of sound amplitudes are calculated for the acoustic pressure fluctuations in the bulk of the flow as well as at the internal airway surface. Tab. 1-2 show that the amplitude of frequencies 1–3 kHz is the highest

330

for all cases. Additionally, the convergent cases lead to consistently higher sound pressure levels (SPL) in the frequency band of 0.5–1 kHz.

(24)

Convergent Convergent

Divergent Divergent

20°

-20°

20°

Flow sources: _iL_i Surface sources: FFT[p'] SPL_f

1.0-1.5 kHz 1.5-2.0 kHz

Figure 11: Acoustic sources in the flow and at the surfaces. The source distribution of the -20^◦ convergent and the 20^◦divergent case is shown. Flow sources are depicted through isosurfaces of the Lamb vector divergence at -0.5 and 0.5. Surface sources are computed by spatial FFT of the surface pressure fluctuations at the frequency ranges 1.0–1.5 kHz and 1.5–2.0 kHz.

(25)

Furthermore, there is a notable increase of sound pressure levels at a divergent angle of 20^◦ and 40^◦ for high frequencies, particularly in the ranges of 1–3 kHz and 3–5 kHz. This trend is in accordance with the increase of turbulence

335

energy at higher divergent angles, which is depicted in Fig. 7 and manifests it- self in the SPL data in the far-field domain of Tab. 3. In these high-frequency bands dominated by broadband fluctuations, the divergent shapes lead to the largest amplitudes.

Table 1: Averaged sound pressure levels (SPL) in dB at various frequency ranges in the supraglottal flow.

Glottal angle (^◦) 0.5–1 kHz 1–3 kHz 3–5 kHz 5–10 kHz

-10^◦conv. 55.0 87.6 72.6 39.3

-20^◦conv. 55.4 87.5 72.8 39.4

-40^◦conv. 55.6 87.6 73.1 39.2

10^◦div. 53.0 87.9 73.2 39.0

20^◦div. 54.7 88.8 73.7 39.5

40^◦div. 54.2 89.0 75.0 39.6

Table 2: Averaged sound pressure levels (SPL) in dB at various frequency ranges at the internal airway surface.

-10^◦conv. 84.3 93.7 74.4 46.5

-20^◦conv. 85.3 93.6 74.6 46.2

-40^◦conv. 85.5 94.0 75.1 46.6

10^◦div. 82.7 93.8 74.6 45.5

20^◦div. 83.7 94.6 75.3 46.5

40^◦div. 83.9 94.8 76.6 46.4

(26)

Table 3: Sound pressure levels (SPL) in dB in the acoustic far field. The monitoring point for the data acquisition is located about 7 cm from the mouth opening for all cases. Its position is indicated in Fig. 1.

-10^◦conv. 43.4 88.5 76.6 37.5

-20^◦conv. 43.6 88.2 76.7 38.8

-40^◦conv. 44.0 88.4 77.2 37.9

10^◦div. 41.3 88.8 76.7 37.8

20^◦div. 43.0 89.6 77.2 38.1

40^◦div. 42.6 90.0 78.8 38.9

4. Discussion and Conclusions

340

Historically, numerical approaches in speech acoustics have often been lim- ited to the application of one-dimensional wave equations to the vocal tract area function of a given airway volume, or the use of the Bernoulli equation for the simulation of the transglottal flow, neglecting unsteadiness of the flow (Stevens, 2000; Titze and Alipour, 2006). However, in experimental studies in-

345

volving flow through various orifices resembling the human glottis, Zhang et al.

(2004) and Zhang and Mongeau (2006) demonstrated how the turbulent, unsteady supraglottal flow affects broadband sound generation. They could show that broadband sound is increased with transglottal pressure drop and higher flow rates. They also confirmed the validity of the quasisteady approximation

350

for broadband sound, particularly at well-established turbulence and at high turbulence kinetic energy (TKE). Furthermore, Krane et al. (2007) showed in addition to the study by Deverge et al. (2003) that the transient flow during shutoff, towards the closing of the vocal folds, has only weak dependence on the cycle frequency. As a result, both studies conclude that viscous effects dom-

355

inate over flow unsteadiness due to wall motion and that the fluctuations in the glottal jet lead to important contributions to voice acoustics. Consequently, Kreiman and Gerratt (2012) stated that there is strong correlation between the

(27)

broadband noise-to-harmonics ratio (NHR) and perceived breathiness of a voice, with voice quality being strongly impacted by the NHR and the relative energy

360

content of broadband sound. Additionally, individual characteristics of a voice other than pitch or loudness are largely influenced by high-frequency parts of the spectrum (Sundberg, 1994). The individual quality of the human voice, also referred to as timbre, is thus likely impacted by the sound production of near-field flow fluctuations that are contributing to the inharmonic broadband

365

part of the spectrum.

This study investigates the effect of changes of the glottal shape, as it occurs during voiced speech, on the the supraglottal flow field and its fluctuations in realistic vocal tract domains extracted through magnetic resonance imaging (MRI) of a healthy subject. It is shown that the larger the glottal angle

370

in a convergent configuration, the smaller the minimum cross-sectional area to which the flow is contracted. This can explain larger flow velocity gradients in the shear layers between the issued jet and the resting fluid and causes further fluctuations due to shear layer instabilities (Schickhofer et al., 2019). Thus, the TKE produced in the glottal jet shear layers shows a noticeable increase of fluc-

375

tuation amplitudes and generated vorticity at convergent shapes of large angles due to this vena contracta effect (cf. Sec. 3.1). In the cases of divergent glottal angles an increase of the magnitude of TKE is visible towards higher angles from Fig. 7(d) to Fig. 7(f) both in the sagittal and coronal plane section. The reason for this effect might be the rise in the shedding of vortical structures

380

already in the intraglottal constriction, which contributes to the further disso- lution into smaller turbulent scales. Dominant intraglottal frequencies of the order of 100 Hz are detected for the divergent shapes with a significant increase towards higher angles from 10^◦ to 40^◦.

On the other hand, higher energy density of the axial velocity in the supra-

385

glottal domain is observed for the convergent angles, where the formation of flow structures is enhanced through the large gradients in the glottal jet shear layers, as opposed to the diffuse flow through constrictions of divergent form (cf.

Sec. 3.2). Dominant vortex shedding frequencies are found to be in the range of

(28)

59–432 Hz, which is congruent with the experimental study of Kucinschi et al.

390

(2006) for low flow rates.

A higher acoustic source strength due to velocity variations in the supraglottal shear layers is noted for convergent configurations and at frequencies of up to 2 kHz (cf. Sec. 3.3). This is demonstrated both through Lighthill sources by computation of the Lamb vector divergence and its isosurfaces, as

395

well as through surface Fourier transformation of pressure fluctuations at the surrounding airway walls.

Finally, sound pressure amplitudes both in the flow as well as at the internal airway surfaces reveal that the largest output of aerodynamically generated sound occurs at frequencies of 1–3 kHz (cf. Sec. 3.4). Since the human ear shows

400

highest sensitivity at frequencies around 2–5 kHz, this information is particularly important for assessing the acoustic impact of flow-induced sound. For the divergent cases and at high frequencies of 1–3 kHz and 3–5 kHz, the largest sound pressure levels are noted for the glottal angles of 20^◦ and 40^◦ (cf. Tab.

1-3), which suggests an important role of these highly divergent shapes in the

405

production of broadband flow fluctuations.

Acknowledgements

This work is supported by the Swedish Research Council (Vetenskapsr˚adet) through the grant VR 621-2012-4256. Computational resources were provided by the Swedish National Infrastructure for Computing (SNIC).

410

Jarmo Malinen and the Speech Research Group from the Department of Math- ematics and Systems Analysis are acknowledged for providing their MRI geometry data of different vowel configurations.

Conflict of interest statement

The authors have no conflict of interest to disclose.

415

(29)

References

Aalto D, Aaltonen O, Happonen RP, Jääsaari P, Kivelä A, Kuortti J, Luukinen JM, Malinen J, Murtola T, Parkkola R, et al. Large scale data acquisition of simultaneous MRI and speech. Applied Acoustics 2014;83:64–75.

Aalto D, Malinen J, Palo P, Aaltonen O, Vainio M, Happonen RP, Parkkola R,

420

Saunavaara J. Recording speech sound and articulation in MRI. In: Biode- vices. 2011. p. 168–73.

Alipour F, Scherer RC. Flow separation in a computational oscillating vocal fold model. The Journal of the Acoustical Society of America 2004;116(3):1710–9.

Arnela M, Dabbaghchian S, Blandin R, Guasch O, Engwall O, Van Hirtum A,

425

Pelorson X. Influence of vocal tract geometry simplifications on the numerical simulation of vowel sounds. The Journal of the Acoustical Society of America 2016;140(3):1707–18.

Blandin R, Arnela M, Laboissi`ere R, Pelorson X, Guasch O, Hirtum AV, Laval X. Effects of higher order propagation modes in vocal tract like geometries.

430

The Journal of the Acoustical Society of America 2015;137(2):832–43.

Deverge M, Pelorson X, Vilain C, Lagr´ee PY, Chentouf F, Willems J, Hirschberg A. Influence of collision on the flow through in-vitro rigid models of the vocal folds. The Journal of the Acoustical Society of America 2003;114(6):3354–62.

Ewert R, Schr¨oder W. Acoustic perturbation equations based on flow decompo-

435

sition via source filtering. Journal of Computational Physics 2003;188(2):365–

98.

Hamman CW, Klewicki JC, Kirby RM. On the lamb vector divergence in Navier–Stokes flows. Journal of Fluid Mechanics 2008;610:261–84.

Hirano M. Clinical examination of voice. Disorders of human communication

440

1981;5:1–99.

(30)

Hirschberg A. Some fluid dynamic aspects of speech. Bulletin de la Communi- cation Parl´ee 1992;2:7–30.

Howe M. Contributions to the theory of aerodynamic sound, with application to excess jet noise and the theory of the flute. Journal of Fluid Mechanics

445

1975;71(4):625–73.

Howe M, McGowan R. Sound generated by aerodynamic sources near a de- formable body, with application to voiced speech. Journal of Fluid Mechanics 2007;592:367–92.

Kaltenbacher M, Z¨orner S, H¨uppe A, Sidlof P. 3D numerical simulations of

450

human phonation. In: Proceedings of the 11th World Congress on Compu- tational Mechanics - WCCM XI. 2014. .

Kniesburges S, Hesselmann C, Becker S, Schl¨ucker E, D¨ollinger M. Influence of vortical flow structures on the glottal jet location in the supraglottal region.

Journal of Voice 2013;27(5):531–44.

455

Krane M, Barry M, Wei T. Unsteady behavior of flow in a scaled-up vocal folds model. The Journal of the Acoustical Society of America 2007;122(6):3659–70.

Krane MH, Barry M, Wei T. Dynamics of temporal variations in phonatory flow. The Journal of the Acoustical Society of America 2010;128(1):372–83.

Krane MH, Wei T. Theoretical assessment of unsteady aerodynamic ef-

460

fects in phonation. The Journal of the Acoustical Society of America 2006;120(3):1578–88.

Kreiman J, Gerratt BR. Perceptual interaction of the harmonic source and noise in voice. The Journal of the Acoustical Society of America 2012;131(1):492–

500.

465

Kucinschi BR, Scherer RC, DeWitt KJ, Ng TT. Flow visualization and acoustic consequences of the air moving through a static model of the human larynx.

Journal of Biomechanical Engineering 2006;128(3):380–90.

(31)

Li S, Scherer RC, Wan M, Wang S, Wu H. The effect of glottal angle on intraglottal pressure. The Journal of the Acoustical Society of America

470

2006;119(1):539–48.

Lighthill MJ. On sound generated aerodynamically I. General theory. Proc R Soc Lond A 1952;211(1107):564–87.

Lodermeyer A, Tautz M, Becker S, D¨ollinger M, Birk V, Kniesburges S. Aeroa- coustic analysis of the human phonation process based on a hybrid acoustic

475

PIV approach. Experiments in Fluids 2018;59(1):13.

de Luzan CF, Chen J, Mihaescu M, Khosla SM, Gutmark E. Computational study of false vocal folds effects on unsteady airflows through static models of the human larynx. Journal of Biomechanics 2015;48(7):1248–57.

Mattheus W, Br¨ucker C. Asymmetric glottal jet deflection: differences of two-

480

and three-dimensional models. The Journal of the Acoustical Society of Amer- ica 2011;130(6):EL373–9.

McGowan RS. An aeroacoustic approach to phonation. The Journal of the Acoustical Society of America 1988;83(2):696–704.

Mihaescu M, Khosla SM, Murugappan S, Gutmark EJ. Unsteady laryngeal

485

airflow simulations of the intra-glottal vortical structures. The Journal of the Acoustical Society of America 2010;127(1):435–44.

Mittal R, Erath BD, Plesniak MW. Fluid dynamics of human phonation and speech. Annual Review of Fluid Mechanics 2013;45:437–67.

Nicoud F, Ducros F. Subgrid-scale stress modelling based on the square of the

490

velocity gradient tensor. Flow, Turbulence and Combustion 1999;62(3):183–

200.

Park JB, Mongeau L. Instantaneous orifice discharge coefficient of a physical, driven model of the human larynx. The Journal of the Acoustical Society of America 2007;121(1):442–55.

495

(32)

Scherer RC, Shinwari D, De Witt KJ, Zhang C, Kucinschi BR, Afjeh AA. In- traglottal pressure profiles for a symmetric and oblique glottis with a divergence angle of 10 degrees. The Journal of the Acoustical Society of America 2001;109(4):1616–30.

Schickhofer L, Malinen J, Mihaescu M. Compressible flow simulations of voiced

500

speech using rigid vocal tract geometries acquired by MRI. The Journal of the Acoustical Society of America 2019;145(4):2049–61.

ˇSidlof P, Z¨orner S, H¨uppe A. A hybrid approach to the computational aeroacoustics of human voice production. Biomechanics and Modeling in Mechanobi- ology 2015;14(3):473–88.

505

Stevens KN. Acoustic Phonetics. volume 30. MIT Press, 2000.

Sundberg J. Perceptual aspects of singing. Journal of Voice 1994;8(2):106–22.

Titze IR, Alipour F. The Myoelastic Aerodynamic Theory of Phonation. Na- tional Center for Voice and Speech, 2006.

Vilain C, Pelorson X, Fraysse C, Deverge M, Hirschberg A, Willems J. Exper-

510

imental validation of a quasi-steady theory for the flow through the glottis.

Journal of Sound and Vibration 2004;276(3-5):475–90.

Williams JF, Hawkings DL. Sound generation by turbulence and surfaces in arbitrary motion. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 1969;264(1151):321–42.

515

Zhang Z. Mechanics of human voice production and control. The Journal of the Acoustical Society of America 2016;140(4):2614–35.

Zhang Z, Mongeau L, Frankel SH. Experimental verification of the quasi-steady approximation for aerodynamic sound generation by pulsating jets in tubes.

The Journal of the Acoustical Society of America 2002;112(4):1652–63.

520

(33)

Zhang Z, Mongeau L, Frankel SH, Thomson S, Park JB. Sound generation by steady flow through glottis-shaped orifices. The Journal of the Acoustical Society of America 2004;116(3):1720–8.

Zhang Z, Mongeau LG. Broadband sound generation by confined pulsating jets in a mechanical model of the human larynx. The Journal of the Acoustical

525

Society of America 2006;119(6):3995–4005.

Zhao W, Zhang C, Frankel SH, Mongeau L. Computational aeroacoustics of phonation, Part I: Computational methods and sound generation mecha- nisms. The Journal of the Acoustical Society of America 2002;112(5):2134–46.

Zheng X, Mittal R, Bielamowicz S. A computational study of asymmetric glottal

530

jet deflection during phonation. The Journal of the Acoustical Society of America 2011;129(4):2133–43.