Feature selection algorithm - Proposed algorithm to improve the intelligibility of speech in no

4.4 Proposed algorithm to improve the intelligibility of speech in noise for monaural

4.4.4 Feature selection algorithm

In the enhancement stage, the estimation of the time-frequency soft mask only involves the operation logsig(vT_{Q) using the weights calculated in the training stage. The implementation}

of the proposed estimator is relatively simple, its computational cost being directly related to the number of features. Considering that theMACoperation is executed in a single instruction, the number of instructions required by the estimator can be reduced to 2P , P being the number of input features included in Q (i.e. P = 2KD). Assuming that the output of the time- frequency analysis is |X(k, l)|2_{, and according to the standard assembler language used in this}

type of DSPs, the number of instructions required for the computation of the input features is 14 + 4(D − 1), as shown in figure4.4. According to this, the number of instructions necessary to process each frequency band is IP F = 4KD + 14 + 4(D − 1). Considering K = 32 and D = 5, theIPFis 670, value that exceeds the reference value calculated in section 4.3.

The computational cost of the algorithm (i.e. the IPF) can be reduced by decreasing the number of features used for the estimation of the mask. For this purpose, a feature selection algorithm is proposed. Considering K=32 and D=5, the number of features available to estimate the mask of each time-frequency point is P = 2KD = 320, which leads to a huge number of possible combinations. Consequently, to perform an exhaustive search is not affordable, and a feature selection algorithm based on evolutionary computation is proposed.

|X|2 LOG _LOG2 +12 inst +2 inst T-F Analysis x(n) +2(D-1) inst pm(LOG) +2(D-1) inst pm(LOG2)

Figure 4.4: Number of instructions associated to the computation of the proposed features. The output of the time-frequency analysis block is the squared modulus (energy) of the STFT.

Evolutionary algorithms (EA) exhibit a great potential to solve certain problems that oth- erwise would be intractable. This type of algorithms are inspired in natural evolution laws such as selection, mutation and crossover, to iteratively search for the optimum solution from the solutions obtained in previous iterations [Haupt and Haupt, 2004]. EA are commonly used in engineering to solve optimization and search problems in a wide range of applications, for instance, automatic speech/music discrimination [Ruiz-Reyes et al., 2010], antenna array design in [Chabuk et al., 2012], mobile robot localization [Kwok et al., 2006], or feature selection for sound classification in hearing aids [Alexandre et al., 2007]. The three main parts of an evolutionary algorithm are the generation of the candidate solutions of the population, the evaluation of a fitness function, and the evolution of the population [Alexandre et al., 2007]. The candidate solutions are defined for each specific problem, and they are composed of a set of elements that may be binary or continuos values. The fitness function is defined as the cost function to optimize, and it also depends on the specific problem to solve. The evolution is based on the crossover and mutation operators which characteristics can be also adapted to each specific problem. The common steps of the evolutionary algorithms implemented in this thesis are listed below.

1. Generation of an initial population of candidate solutions. The size of the population NP is a crucial issue for the EA performance. On the one hand, a large population could

cause more genetic diversity (and thus, a higher search space) and consequently suffer from slower convergence. On the other hand, with a very small population, only a reduced part of the search space could be explored, thus increasing the risk of prematurely converging to a local extreme. In each specific problem, the population size should be chosen as a tradeoff between computational complexity and performance. The candidate solutions of the initial population can be either generated at random or initialized with a determined set of values, for instance, to accelerate the convergence of the algorithm.

2. In some cases, it is necessary to validate the new population in order to check if all candidate solution fulfill some constraints applied to the possible solutions. Those candidates that do not fulfill the constraints are iteratively regenerated until they are valid.

3. Evaluation of the fitness (cost) function for each candidate in the population.

4. A selection process is performed, using the results of the evaluation of the fitness function as ranking. It consists in selecting a subpopulation of NSP candidate individuals that

generation. In some cases, not all of the worst individuals in the current generation are replaced, in other cases they are. The first replacement strategy, called the “steady state approach”, prevents the algorithm from prematurely converging to a local minima. 5. Breed the new generation by recombining the parents by using a crossover operator. A

number of NP − NSP novel candidates for the next generation are generated by ran-

dom crossover of the previously selected NSP best candidates. The probability that the

crossover operator is applied to each individual is called crossover probability, PC < 1. The

crossover operator implemented in theEAs included in this thesis is an uniform crossover (UX) operator with PC = 0.5. The offspring has approximately half of the elements from

the first parent and the other half from the second parent, and these elements are randomly selected.

6. To mutate or randomly change the offspring. The main purpose of a mutation operator is to maintain diversity within the population and inhibit premature convergence to local extreme. Not all the offspring elements are mutated: the probability that the mutation operator is applied is called mutation probability, PM < 1, and its value is usually

found empirically. The mutation operator depends on the characteristics of each candidate solution.

7. To iterate steps 2 − 6 until a maximum number of generations is reached or if the best value of the fitness function remains unchanged for a given number of iterations.

The values of the parameters of the EA (population size, crossover rate, mutation probability, mutation scheme, and number of generations) should be chosen in each specific problem to obtain a good tradeoff between design time and performance.

The goal of theEAproposed in this section is the selection of determined number of features (Nfeatures) among the whole set with the aim of obtaining the maximum output PESQ value, trying to approximate the value obtained by the system when the classifier uses all features. Each candidate solution of theEAcontains the indexes of the selected input features. The ideal feature selection algorithm would first compute the weights with theGLSE (vMSE) to estimate

the Wiener mask using the selected subset of features (i.e. the candidate solution), optimize that weights to maximize the outputPESQvalue (vopt), and use that value as cost function to

select the best candidate solution. This process would have to be repeated for each candidate solution and iteration. Unfortunately, the evaluation of the PESQ function is computationally expensive. Using a powerful computer with a 2 x 2.96 GHz 6-core processor and 32 GB of RAM, the evaluation of thePESQfunction for the design set takes around 19 s. Assuming a population of 100 candidate solutions and 1000 iterations, the time required by the EA is approximately 528 hours. This implementation is not affordable and an efficient alternative is proposed.

The proposed feature selection algorithm, rather than searching for the subset of features that maximizes the output PESQ, it searches for the subset of features that best approx- imates the time-frequency mask estimated in the case of using all the input features, i.e, y = logsig(voptTQ). In this case, the weights vopt are taken as input values and they are

not recalculated for each candidate solution. Instead, only the weights corresponding to the selected features of each candidate solution are considered, setting the remaining weights to zero. The selected weight vector is labeled as vsel, and each candidate solution is a vector of

the same size of v containing values of one in the positions of the selected features and zero in the remaining positions. The cost function considered by the feature selection algorithm is the MSEbetween the mask estimated using all features, y = logsig(vT

using only the weights of the selected features, y = logsig(vT

selQ). Hence, dropping the logsig

function, the cost function is MSE = vT

optQ − vselT Q.

The complete steps of the feature selection algorithm are the next:

1. An initial population of 100 candidate solutions is generated. Each candidate solution contains P random binary values (0 or 1).

2. The candidates of the population are validated to fulfill the constraint of total number of features. If a candidate solution exceeds the maximum number of features (Nfeatures), random positions are decreased by one (avoiding negative values). The process iterates until the candidate solution fulfills the requirement.

3. The cost function is evaluated for each candidate solution of the population. It consists of the computation of theMSE between the mask estimated using all features and the mask obtained using the weights of the features selected by each candidate solution.

4. A selection process is applied, using the MSE of each solution as ranking. It consists in selecting the best 10% of the solutions of the population, removing the remaining solutions. 5. The remaining 90% solutions of the new generation are then generated by uniform crossover

of the best candidates.

6. Mutations are applied to the candidate solutions of the new population that are repeated, excluding the best solution. Mutations consist of changing the values of random positions of the candidate solution.

7. The process is repeated from step 2 to 6 until 1000 generations are evaluated.

The features contained in the best solution obtained in the last iteration are considered to be the optimized solution. The weights corresponding to that features are labeled as vsel. The values of

the parameters of the evolutionary algorithm (population size, crossover rate, mutation scheme and number of generations) have been found to obtain a quite good tradeoff between design time and performance for the experiments carried out.

Finally, the weights obtained in this step vsel are optimized again to increase the output

PESQ, using the algorithm described in the previous section. The optimized selected weights are labeled as vselopt, and they are the output weights of the training stage.

In document Speech enhancement algorithms for audiological applications (Page 116-119)