FastIVA with Student’s t Source Prior - Learning Algorithm: Newton Method-FastIVA

4.4 Learning Algorithm: Newton Method-FastIVA

4.4.1 FastIVA with Student’s t Source Prior

As discussed earlier, the source prior is crucial to the performance of the IVA algorithm, therefore by choosing an appropriate source prior can ehnhance the separation performance of the IVA method. Also from earlier discussion, it is known that speech signals are highly non-stationary in nature and can have many useful samples with high amplitudes.

The Student’s t distribution due to its heavy tail nature can better model the high amplitude data points in speech signals. So, the multivariate Student’s t distribution is adopted as the source prior for the FastIVA method [76]. By using the multivariate Student’s t distribution, the dependency within the source vectors can be preserved and because of the heavy tail nature of the Student’s t distribution, it can improve the modelling of high amplitude information in different speech sources and thereby improve the separation performance of the FastIVA method.

4.4. Learning Algorithm: Newton Method-FastIVA 85

prior. When the multivariate Student’s t distribution as described in Equation (4.8) is adopted as the source prior in the FastIVA method, the non-lnear function can be found as follows. The multivariate Student’s t distribution is adopted as the source prior for the FastIVA algorithm, namely

p(si) ∝ 1 + (si− µi) T_Λ(s i− µi) ν −ν+K₂ (4.21)

and by using Equation (4.21), the non linear function can be calculated as:

F ( K X k0₌₁ |ˆsi(k0)|2)0 = ν + K ν 1 + PK k=1|si(k)|2 ν −ν+K₂ (4.22)

The leading coefficient ν+K_ν can be absorbed in the step size in the update equation, therefore by normalising it to unity and with zero mean and unity variance assumption, Equation (4.22) can be written as:

F ( K X k0₌₁ |ˆsi(k0)|2)00= 1 −PK k0₌₁|ˆsi(k0)|2 (1 +PK k0₌₁|ˆs_i(k0)|2) 2 (4.23)

The above mentioned non-linear function is a multivariate function. Hence, this non-linear function can retain the interfrequency dependency as all the frequency bins are accounted for during the learning process. Also by changing the value of the degrees of freedom parameter ν, the tails of the distribution become heavier and therefore it can enhance the modelling of the information in the high amplitude data points in speech measurements. The separation performance of the proposed FastIVA algorithm will be evaluated in simulated and real room

environments and the results will be compared with the FastIVA method using the original super Gaussian source prior in the next section.

4.4.2 Experimental Results

The separation performance of the FastIVA method with new Student’s t source prior is evaluated in this section. Firstly, the proposed FastIVA algorithm is tested with the image method and then to evaluate the performance of the proposed method in the realistic scenarios, it is tested with the real room impulse responses.

Evaluation of FastIVA with Image Method

The proposed FastIVA method with multivariate Student’s t source prior is firstly tested in a simulated environment generated with the image method [84]. Mostly, the experimental settings are similar to those in the case of experiments with the original IVA method. A 2 x 2 case is considered and the speech signals for the experiments are randomly selected from the whole of the TIMIT dataset [79]. The length of each speech signal is approximately four seconds. The STFT length is 1024 and the sampling frequency is 8kHz. The size of the room is 7 x 5 x 3 m3 and the location of microphones are [3.42 2.50 1.50]m and [3.48 2.50 1.50]m respectively. The RT60 for these experimental settings is 200ms.

The mixed signals are separated by using the FastIVA method with both the proposed Student’s t source prior and the original super Gaussian source prior. The separation performance is measured in SDR and the results are shown in Table 4.4.

Table 4.4 shows the separation performance of the FastIVA method with both Student’s t and the original super Gaussian source prior for ten different set of mixtures. All the SDR values shown in Table 4.4 are the average of two separated signals. In comparison the Student’s t source prior based algorithm performs

4.4. Learning Algorithm: Newton Method-FastIVA 87

Table 4.4: SDR (dB) values for FastIVA method with both source priors. Stu- dent’s t source prior for the FastIVA method improves the separation performance for all the mixtures.

Original (dB) Student’s t (dB) Improvement (dB)

Set-1 10.88 12.02 1.14 Set-2 10.49 11.31 0.82 Set-3 12.76 13.73 0.97 Set-4 13.02 13.93 0.91 Set-5 11.84 12.59 0.75 Set-6 13.38 14.42 1.04 Set-7 13.47 14.28 0.81 Set-8 12.15 12.97 0.82 Set-9 10.66 11.42 0.76 Set-10 11.38 12.44 1.06

better than that using the original super Gaussian source prior for all set of mixtures, which is evident from the table. The average performance improvement in the SDR for the multivariate Student’s t source prior is approximately 0.9 dB. The room impulse responses generated by the image method are helpful in comparing different methods but they can’t evaluate the separation performance of BSS methods in the realistic scenarios. Therefore the separation performance of the proposed FastIVA method with the multivariate Student’s t source prior in realistic scenarios is discussed in the next section.

Evaluation of FastIVA with the Real Room Impulse Responses

In this section the proposed FastIVA algorithm is evaluated in a real classroom environment by using the binaural room impulse responses (BRIR) generated by Shinn-Cunningham [87]. Experimental settings are kept the same as for the case of the original IVA method. Again, the centre location of the room is considered for these experiments and the RT60 = 565ms. As the RT60 is really high for

this particular set of experiments therefore it provides good evaluation of the proposed algorithm in highly reverberant real room environment. Speech signals

are randomly chosen from the whole of the TIMIT dataset [79]. A 2x2 case is consider and in order to consider the changing position of sources in real room environment, six different source location (15◦, 30◦, 45◦, 60◦, 75◦, 90◦) azimuths relative to second source were considered. Furthermore, to improve the reliability of results, all the simulations are repeated three times. The summary of different parameters used for this set of experiments is given in Table 4.5

Table 4.5: Summary of parameters used in experiments.

Sampling rate 8kHz

STFT frame length 1024

Velocity of sound 343 m/s Reverberation time 565 ms (BRIRs)

Room dimensions 9 m x 5 m x 3.5 m Source signal duration 4 s (TIMIT)

In the first set of experiments, mixtures are created from the speech signals from the TIMIT dataset and by the impulse responses generated by BRIRs with RT60

of 565ms. These mixtures are then separated by using the proposed FastIVA method with the multivariate Student’s t source prior and its separation performance is measured in SDR (dB) and the results are then compared with the FastIVA method with the original super Gaussian source prior. As benchmarks the basic FastICA [57] and intelligently initialised FastICA [89] are also included in comparisons and the results are shown in Figure 4.4. It is evident from Figure 4.4 that the FastIVA algorithm with proposed multivariate Student’s t source prior performs better then the original super Gaussian source prior at all the separation angles. The FastICA and the intelligently initialised FastICA used for the separation of mixtures have poor separation performance in these experiments because of the permutation problem and also there is no pre or post processing used for these methods, which is generally needed in FastICA methods. All the SDR

4.4. Learning Algorithm: Newton Method-FastIVA 89

values shown in Figure 4.4 are averaged over eighteen random speech mixtures at all the separation angles that established the improved separation performance of the proposed multivariate Student’s t source prior for the FastIVA method. Overall, the proposed source prior improves the separation performance of the FastIVA method by approximately 0.9dB.

Figure 4.4: The graph indicates the separation performance of the FastIVA and FastICA algorithms. All the SDR (dB) values are averaged over eighteen random speech mixtures. The Student’s t source prior enhance the separation performance of the FastIVA algorithm at all separation angles.

Generally, objective evaluations for real mixtures such as SDR are very useful in order to compare the performance of different methods but they can not portray

the true quality of separated speech signals. Therefore in addition to the objective evaluation, the separation performance of the proposed Student’s t source prior for the FastIVA method is also evaluated by using the subjective measure of perceptual evaluation of speech quality (PESQ) [78]. The same experimental settings were used as before and the mixtures were created by using the speech signals from TIMIT dataset in BRIRs. Then mixtures were separated by using the FastIVA method with the original super Gaussian source prior and the proposed Student’s t source prior. Then the PESQ score is calculated for the separated signals from both methods by comparing the separated speech signals with the original speech signals. PESQ score is generally between 0 to 4.5, with 0 being poor score and the score of 4.5 is assigned to signal, that are almost identical. PESQ score for both source priors for five different set of mixtures is shown in Table 4.6 and for each set, PESQ score is averaged over six different locations in the room which are source azimuth angles varying over (15◦, 30◦, 45◦, 60◦, 75◦, 90◦) relative to the second source which improves the reliability of the results. Table 4.6 indicates that the proposed multivariate Student’s t source prior even in highly reverberant real room environment can consistently achieve better PESQ score than the original super Gaussian source prior for the FastIVA algorithm.

Table 4.6: PESQ score for the Student’s t source prior and the Original Super Gaussian source prior for the FastIVA algorithm. All PESQ values are averaged for six different source locations and for all sets of mixtures, the Student’s t source prior has better PESQ score.

Original super Gaussian Source Prior Student’s t Source Prior

Set-1 1.65 1.81

Set-2 2.03 2.25

Set-3 2.14 2.29

Set-4 1.92 2.09

4.4. Learning Algorithm: Newton Method-FastIVA 91

Furthermore, the convergence speed of the FastIVA method was measured with the new Student’s t source prior. Since the main purpose of introducing FastIVA method was to improve the convergence speed of the original IVA method, therefore it is vital to test the convergence speed for the proposed source prior for the FastIVA method. The same set of experiments was repeated in order to measure the convergence speed of the proposed method. The convergence speed was measured by counting the number of iterations that the FastIVA method was needed to converge as measured by changing likelihood of the algorithm. The convergence of the algorithm is calculated when the change of the norm of the weight matrix is less then 10−6 and it was measured for the FastIVA with both the new Student’s t and the original super Gaussian source prior and the results are shown in Figure 4.5.

It is clear from Figure 4.5 that the FastIVA with new Student’s t source prior converges swiftly as compared with the original super Gaussian source prior based FastIVA algorithm. For most of the angles the new Student’s t based FastIVA method only needs almost half the number of iterations that were needed for the original super Gaussian based IVA method. The main purpose of the FastIVA method was to make the algorithm converge faster and the new Student’s t source prior further improves the convergence speed of the FastIVA method, which is vital when using the algorithm in real time applications.

Figure 4.5: The number of iterations needed for the FastIVA algorithm to converge using both the original super Gaussian [21] and Student’s t source priors in realistic RIRs is shown. The Student’s t source prior at most angles need almost half the number of iterations.

4.5 Summary

In this chapter, a new multivariate Student’s t source prior was introduced for the IVA and the FastIVA algorithm. The source prior for the IVA method is crucial to the performance of the algorithm as the non-linear score function is used to retain the inter-frequency dependency derived based on the PDF of the source. The multivariate Student’s t distribution that belongs to the family of

4.5. Summary 93

multivariate super Gaussian distributions is used in this work to model the high amplitude data points in speech signals. The multivariate Student’s t distribution has heavier tails, thereby it can make use of the information lying in high amplitudes. Speech signals can have significant high amplitude data points such as voice sounds, therefore the multivariate Student’s distribution is well suited to model the speech signals. Also, highly reverberant mixtures were used to evaluate the performance of the proposed source prior, which were more challenging to separate as compared to previous studies. The new experimental results in the highly reverberant real room environments, confirms that the proposed Student’s t source prior consistently improves the separation performance of both the IVA and the FastIVA algorithm.

ENERGY DRIVEN MIXED

SOURCE PRIOR FOR THE

INDEPENDENT VECTOR

ANALYSIS ALGORITHM

The independent vector analysis algorithm preserves the dependency within each source vector to solve the permutation problem. Statistical models that can improve the dependency structure within each source vector are still needed to further improve the separation performance of the IVA method. As discussed in Chapter 4, in the past various statistical models have been proposed to improve the statistical dependence within the IVA method [94–96]. The multivariate source prior is important in all versions of the IVA algorithm, since it is used to derive the nonlinear score function and retain the dependency between different frequency bins [22].

In this chapter, a new enhanced multivariate source prior is introduced for the IVA algorithm. Instead of a conventional single distribution source prior, the proposed source prior is a mixture of the original multivariate super Gaussian

5.1. Source Prior for the IVA method 95

distribution as in [21] and the multivariate Student’s t distribution. The Stu- dent’s t distribution is a super Gaussian distribution which has heavier tails and it can have a certain advantage when modelling speech signals. It is also stated in [72], that the Student’s t distribution is well suited to model certain types of speech signals. Human speech is highly random in nature and can have many high and low amplitude components [12]. Therefore, the Student’s t distribution due to its heavy tail nature can capture and model the information in high amplitude components in an efficient manner [72] and at the same time, the original super Gaussian distribution can be used to model the other data points in the speech signal. The contribution in this chapter is that the weight of both distributions can also be adjusted in the mixed source prior, which enables the source prior to adapt to different types of speech signals. The ratio of both distributions in the mixed source prior is adjusted according to the energy of the observed mixtures. Importantly, this method is found to be successful only when the observed mixtures are available and not the original sources. Moreover, to further enhance the separation performance of the proposed IVA algorithm, the fully connected frequency bin structure is decomposed into smaller groups as the neighbouring frequency bins generally have much stronger dependency as compared to distant frequency bins where the dependency is generally much weaker [23, 92]. There- fore, the strong dependency between neighbouring frequency bins is exploited by dividing them into smaller cliques whilst retaining considerable overlap between adjacent cliques. Furthermore, the new energy driven mixed source prior with clique based dependency structure is evaluated in real room environments and it consistently improves the separation performance of the IVA algorithm.

5.1 Source Prior for the IVA method

A new multivariate source prior that can better preserve the dependency structure within different frequency bins is needed to improve the separation performance

of the IVA algorithm. Instead of a single distribution source prior, a mixture of original multivariate super Gaussian source prior and the multivariate Student’s t distribution is found to be a suitable source prior for the IVA method.

The cost function for the original IVA algorithm is only minimised when the vector sources are independent while the dependency within the components of each vector is still preserved. Thus the cost function retains the inherent frequency dependency within each source vector, whilst removing the dependency among the sources [22]. When the cost function for the IVA method is minimised by using the gradient descent algorithm, the nonlinear function ϕ(k) for source ˆsi is

given as [21]:

ϕ(k)(ˆsi(1) · · · ˆsi(k) · · · ˆsi(K)) = −

∂logq(ˆsi(1) · · · ˆsi(k) · · · ˆsi(K))

∂ ˆsi(k)

(5.1)

where ϕ(k)(ˆsi(1) · · · ˆsi(k) · · · ˆsi(K)) is a multivariate score function and is used

to preserve dependency across the frequency bins, denoted by index k. This nonlinearity represents the core idea of the IVA algorithm, as it is a multivariate function so it can preserve the dependency between different frequency bins. Since this multivariate score function is obtained from the source prior, it is vital to choose an appropriate multivariate source prior to retain the dependency structure for a better separation performance of the IVA algorithm.

In the original IVA method [21], the source prior representing the inter-frequency dependencies is a dependent multivariate super-Gaussian distribution and it can be derived as follows: Suppose a K dimensional random variable is explained by:

si =

√

v.zi+ µi (5.2)

where v is a scalar random variable, µi is a K-dimensional deterministic variable

5.1. Source Prior for the IVA method 97

a Gaussian distribution with covariance matrix Σi and zero mean, that is

p(zi) = αzexp ( −

z†_iΣ−1_i zi

2 ) (5.3)

where (.)†denotes a Hermitian transpose and αzis a normalization term. Suppose

that v has a Gamma distribution, that is:

p(v) = αvv

K−1

2 exp ( −v

2) (5.4)

where αv is also a normalization term and the conditional distribution p(si|v) is

a Gaussian with mean µ and covariance σi. Therefore the original source prior

can be obtained [21]: p(si) = Z ∞ 0 p(si|v)p(v)dv = α1 Z ∞ 0 √ v exp − 1 2 (s_i− µi)†Σ−1i (si− µi) v + v dv = α2exp − q (si− µi)†Σ−1i (si− µi) (5.5)

Equation (5.5) shows there is a variance dependency between the frequency bins, which means when the variance of one frequency component is large then the

In document Enhanced independent vector analysis for speech separation in room environments (Page 84-98)