Method a – minimum Rt - Acoustic parameter estimation from ML decay phases

7.4 Acoustic parameter estimation from ML decay phases

7.4.1 Method a – minimum Rt

The parameters obtained from the maximum likelihood method are those for the envelope of the impulse response rather than the actual decay curve. Substituting these parameters back into the model and performing Schroeder’s backward integration [10] yields the estimated decay curve. Reverberation parameters can then be extracted from the decay curve using the standard definitions given in ISO 3382 [8]. A short section (90s) of reverberant speech can yield a large number of decay curve estimates. The frequency distribution of Rt and EDT estimates (histogram) for each decay phase, identified by the envelope segmentation method (Chapter 7.3) in a 90s speech sample convolved with a simulated impulse response and filtered into the 1kHz octave band, is plotted in Figure 7-19.

Chapter 7 : Improved Maximum Likelihood Estimation of Acoustic Parameters 135

Figure 7-19. Probability distribution of maximum likelihood estimates of (a) Rt and (b) EDT, compared with actual Rt and EDT values. Minimum ML estimates RTEST and

EDTEST are show on the plots. 90s speech stimulus for a single artificial room response.

What becomes apparent is that performing the estimation using all available decay phases can cause large underestimations in the decay rate. This is because the decay phase with the lowest decay rate may be very short and not represent a reasonable

Rt (s)

Chapter 7 : Improved Maximum Likelihood Estimation of Acoustic Parameters 136

length of the decay curve. As previously mentioned the ML estimates provide a reliable estimate for the dynamic range. Utilising this information by performing ML estimation using only decay phases with at least 25 dB of dynamic range, yields the frequency distributions shown in Figure 7-20;

Figure 7-20. Probability distribution of maximum likelihood estimates with >25dB dynamic range of (a) Rt and (b) EDT, compared with actual Rt and EDT values . Minimum ML estimates RTEST and EDTEST are show on the plots. 90s speech stimulus

for a single artificial room response.

Rt (s)

Chapter 7 : Improved Maximum Likelihood Estimation of Acoustic Parameters 137

For comparison purposes the same speech sample was also convolved with a real measured room response and the algorithm run again, a similar result is found for real room impulse responses.

To further investigate the accuracy of the system, a large number of RIRs were convolved with the same 90s speech sample. For each RIR the true acoustic parameter was calculated using the standard method. Each result is filtered into octave bands and these show the results for the 1kHz octave band. Figure 7-21 and Figure 7-22 show the comparison between the estimate and true decay parameters. For the results using simulated impulse responses, in Figure 7-21, the EDT estimation is more inaccurate than the reverberation time. For the real rooms, Figure 7-22, it appears that there is a tendency for slight underestimation for both parameters. The reverberation time estimation is also more reliable than the EDT estimation for real rooms. One reason for this is that speech utterances have a decay rate themselves and there is uncertainty as to where the true beginning of the room impulse response is. Slight underestimations in Rt and EDT can also be due to the overall variability in the dynamic range estimation. Decays with too little dynamic range will almost certainly give under-estimation as the early reflections more often than not decay faster than the late decays. This can be taken into account by performing multiple measurements using this procedure and averaging the resultant parameter values.

Chapter 7 : Improved Maximum Likelihood Estimation of Acoustic Parameters 138

Figure 7-21. Comparison of (a) reverberation time and (b) EDT, estimated using multi- decay maximum likelihood estimation and the true values. Artificially reverberated

Chapter 7 : Improved Maximum Likelihood Estimation of Acoustic Parameters 139

Figure 7-22. Comparison of (a) reverberation time and (b) EDT, estimated using multi- decay maximum likelihood estimation and the true value. Speech received in real

rooms.

Figure 7-23 shows the estimations of C80 and ts performed using this method. It should

be noted that for clarity, rather than using the smallest estimate within a region (as for Rt), the largest value is used.

Chapter 7 : Improved Maximum Likelihood Estimation of Acoustic Parameters 140

Figure 7-23. Comparison of (a) Centre time and (b) C80, estimated using multi-decay maximum likelihood estimation and the true value. Speech received in simulated rooms.

Results that utilise this method have been published in [9]. It was found that by using a longer speech signal, yielding estimates from multiple segments, then averaging the parameters, the accuracy was improved. By using the median as an estimate of the average rather than the mean this avoided bias in the result from any outliers. It was also found that the distribution of estimates was not normal and using the median in this case is more appropriate.

Chapter 7 : Improved Maximum Likelihood Estimation of Acoustic Parameters 141

A disadvantage of this method is that while good accuracy is gained for all of the parameters individually, it is likely that estimates were gleamed from different decay phases for different parameters. This means there is no single optimal decay curve estimate. A single optimal decay curve estimate is desirable, as it will facilitate the estimation of other parameters such as binaural parameters, it may also be helpful in the blind estimation of a room impulse response for auralisation purposes. Therefore methods b and c have been developed to enable the estimation of a single optimal decay curve estimate.

In document Blind estimation of room acoustic parameters from speech and music signals (Page 156-163)