Combining MMSE estimation based regularized NMF with post-

6.2 Source signals reconstruction and smoothed masks

6.3.3 Combining MMSE estimation based regularized NMF with post-

The post smoothing can also be used as a post process to the regularized NMF using MMSE estimates that is described in Chapter5. This means, we applied the regularized NMF approach in Chapter 5 to solve for the gains matrices. Then we post-smoothed the gains matrix solution within the spectral mask using the 2D smoothing filters. Since median filter gives better SIR values and Hamming filter gives better SNR as shown in Table6.6, we tried the combination of both methods (regularized NMF using MMSE and NMF with post-smoothing) using just these two filters as shown in Table6.7. Comparing the results in Table 6.6 with Table 6.7, we can see that, using post smoothing with the MMSE estimates based regularized NMF gives a remarkable improvement in the SIR values and good improvements in SNR values compared to the case of using post smoothing with NMF without the MMSE regularization.

Table 6.7: SNR and SIR in dB for the estimated speech signal using MMSE estimates based regularized NMF and smoothed masks for different filter types and different filter

size a = 1, K = 16, λ = 1 and different values for b.

SMR No smoothing Median Filter Hamming Filter

b = 7 b = 19 b = 7

dB SNR SIR SNR SIR SNR SIR SNR SIR

-5 2.88 4.86 5.88 15.23 6.04 9.92 5.67 10.33

0 5.50 8.70 7.18 16.90 7.54 12.81 7.26 13.28

5 8.37 12.20 9.26 18.65 9.69 15.15 9.47 15.73

Comparing the results in Table5.1with Table6.7, we can see that, using post smoothing with median filters after MMSE estimates based regularized NMF improves both the SIR and SNR values compared to the case of using regularized NMF only without post smoothing. For the case of using Hamming filter for smoothing after the regularized NMF, we obtained better SNR values but slightly better values for SIR when b = 7. The achieved improvement due to combining MMSE estimates based regularized NMF with the post-smoothing compared with the case of using just NMF (first column in Tables6.6 and 6.7) is considered to be remarkable.

Table6.8 shows the “oracle” results where we put the correct magnitude of the speech signal with the phase of the mixed signal. These results represent the gold standard that can be achieved when the magnitude spectra are recovered exactly. As can be seen from Tables 6.7 and 6.8, the achieved SIR results of using MMSE estimation in the regularized NMF followed by smoothed masks are very close to the SIR in the oracle experiment. The achieved SNR results in Table 6.7 are considered to be good as well but there is more that can be achieved for the SNR.

Table 6.8: SNR and SIR in dB for the oracle experiment.

SMR Oracle dB SNR SIR -5 9.25 15.21 0 11.62 16.90 5 14.46 19.41

6.4 Conclusion

In this chapter, we studied new methods to enforce smoothness on the NMF solutions rather than using regularized NMF with the continuity prior. The new methods are

based on post-smoothing the NMF decomposition results. We also studied the case when the MMSE estimates based regularized NMF that had been introduced in Chapter5was followed by the post-smoothing process that was presented in this chapter. The achieved improvements of using post-smoothing for the case of using NMF with and without MMSE estimates based regularization is considered to be quite large improvements.

Spectro-temporal

post-enhancement using MMSE

estimation

7.1 Motivations and overview

In Chapter5, minimum mean squared error (MMSE) estimation was used to improve/- correct the gains matrix solution of the NMF. MMSE estimate based correction of the gain matrices was performed using a regularized NMF cost function. In this chapter, MMSE estimation is used to improve/correct the NMF separated spectrograms. MMSE estimate based correction of the separated spectrograms is embedded in the Wiener filter to guarantee that the sum of the estimated sources be equal to the mixed signal. In Chapter 5, we were trying to improve the IS-NMF solution for the gains matrices only since the trained basis matrices were assumed to be good in representing the training data. The trained basis matrix that is usually used as a representative for each source training data is usually not sufficient to represent all the characteristics of each source. This representation may be limited since the dynamic information between the frames is missing and there is no analytical approach for choosing a suitable number of bases for a given source signal. More information about the sources besides their trained basis matrices is usually needed.

In this chapter, besides training a basis matrix for each source, the spectrogram for each training data is directly used to train a Gaussian mixture model (GMM) in the logarithm domain. The trained basis matrices are used with NMF to compute a spectrogram for each source from the mixed signal. The computed spectrogram of each source is then treated as a 2D distorted signal. The trained GMM and the expectation maximization algorithm (EM) [102] are used to learn the distortion in each separated signal spectrogram. The trained GMMs, the learned distortions, the minimum mean squared error (MMSE) estimates, and the Wiener filters are used to find enhanced ver- sions of the separated spectrograms. To consider the dynamic information between the spectrogram frames, we apply the enhancement approach on multiple consequent frames at once instead of applying it frame by frame.

In document Incorporating prior information in nonnegative matrix factorization for audio source separation (Page 97-101)