Histogram equalisation - A Voice and Face Biometric System using HE-SVM

A Voice and Face Biometric System using HE-SVM

4.2 Histogram equalisation

NMF does not use the information about how the various facial images are separated into different facial classes. The most straightforward way to exploit discriminative information in NMF is to try to discover discriminative projections for the facial image vectors after the projection to the basis image matrix. Let that the facial image database contains K different classes (persons) with each class r containing N_r images. Now let the matrix be organised as follows: the j-th column of the database is the ρ-th image of the r-th class. Thus, the vector

X X

h that corresponds to the j-th column of the matrix H is the coefficient vector for j

the ρ-th facial image of the r-th class.

(a) (b)

Figure 4.1. A set of 25 basis images for (a) NMF and (b) NMF-faces.

There is no upper limit for how many bases someone can construct using NMF decomposition in the update rule for the matrix , and unless a limited number of bases by NMF is created, the within scatter matrix for the coefficient vectors

h is singular (i.e. the matrix is not j

invertible). In order to solve this problem, the Fisherfaces approach (Belhumeur, Hespanha et al. 1997) is used.

The face recognition scores used in these experiments have been calculated in this way with the NMF-faces method (Zafeiriou, Tefas et al. 2005; Zafeiriou, Tefas et al. 2005), in which the final basis images are closer to facial parts.

4.2 Histogram equalisation

Histogram equalisation (HE) is a non-linear transformation that converts a probability distribution to another. The aim of this transformation is to match the statistical properties (mean, variance, skew, kurtosis, etc.) of two probability distributions.

Histogram equalisation is a widely used non-linear method designed for the enhancement of images. It employs a monotonic, non-linear mapping that reassigns the intensity values of pixels in the input image in order to control the shape of the output image intensity

4-81 Figure 4.1: A set of 25 basis images for (a) NMF and (b) NMF-faces.

There is no upper limit for how many bases someone can construct using NMF decomposition in the update rule for the matrix Z, and unless a limited number of bases by NMF is created, the within scatter matrix for the coefficient vectors h_j is singular (i.e. the matrix is not invertible).

In order to solve this problem, the Fisherfaces approach (Belhumeur et al., 1997) is used. The face recognition scores used in these experiments have been calculated in this way with the NMF-faces method (Zafeiriou et al., 2005a,b), in which the final basis images are closer to facial parts. A set of several basis images for both NMF and NMF-faces methods is illustrated in Figure 4.1.

4.2 Histogram equalisation

Histogram equalisation (HE) is a non-linear transformation that converts a probability distrib-ution to another. The aim of this transformation is to match the statistical properties (mean, variance, skew, kurtosis, etc.) of two probability distributions.

80 A Voice and Face Biometric System using HE-SVM

Histogram equalisation is a widely used non-linear method designed for the enhancement of images. It employs a monotonic, non-linear mapping that reassigns the intensity values of pixels in the input image. The aim is to control the shape of the output image intensity histogram in order to achieve a uniform distribution of intensities or to highlight certain intensity levels.

This technique has been also developed for speech recognition adaptation approaches and correction of non-linear effects typically introduced by speech systems such as microphones, amplifiers, clipping and boosting circuits and automatic gain control circuits (Balchandran and Mammone, 1998; Hilger and Ney, 2001).

The objective of HE is to find a non-linear transformation that aims to reduce the mismatch of the statistics of two signals. In Pelecanos and Sridharan (2001); Skosan and Mashao (2006) this concept was applied to the acoustic features in order to improve the robustness of a speaker verification system by reducing the mismatch between training and test conditions and the additive noise and channel and transducer effects.

In this thesis, histogram equalisation is applied to the score distributions. To this end, a matching of the cumulative distribution function (CDF) of a reference distribution and the CDF of the variable to be transformed is performed as follows (de la Torre et al., 2005; Skosan and Mashao, 2006):

Let x be a random variable with a probability distribution px(x), and let y = T (x) be a single-valued and monotonically increasing transformation function that converts the probability distribution p_x(x) into a reference probability distribution p_ref(y). The transformation T (x) then makes the probability of finding x in the differential range dx equal to the probability of finding y in the differential range dy, i.e.:

pref(y)dy = px(x)dx , (4.3)

and modifies the original probability distribution px(x) according to the expression

p_ref(y) = px(x)dx

dy = p(G(y))dG(y)

dy , (4.4)

where G(y) = x is the inverse of T (x). Using Equation 4.4, the cumulative distribution functions associated with p_x(x) and p_ref(y) are related as follows:

C_x(x) = Z _x

−∞

p_x(x⁰)dx⁰= Z _{T (x)}

−∞

p_x(G(y)⁰)dG(y) dy dy⁰

= Z y

−∞

pref(y⁰)dy⁰ = Cref(y) = Cref(T (x)). (4.5)

Thus, the transformation T (x) is given by:

T (x) = C_ref⁻¹(Cx(x)) , (4.6)

where C_ref⁻¹ is the inverse of the CDF of the reference probability distribution.

Nevertheless, only a finite number of observations are usually available in practical implemen-tations, and cumulative histograms are used instead of cumulative probabilities; for this reason, the transformation is called histogram equalisation. A schematic diagram in Figure 4.2 shows the matching process of the cumulative histogram of the original variable x and the reference cumulative histogram.

cumulative histogram of the original variable

reference cumulative histogram

y x

Cx(x)=Cref(y) 1.0

Figure 4.2: Histogram equalisation: matching of the cumulative distribution.

Practical implementation

The aim of histogram equalisation is to transform the score distributions obtained for each modality, so that their statistics are equal to those of a reference distribution. Therefore, the first step is to select a suitable reference distribution pref(y) corresponding to one of the modalities and compute its cumulative histogram C_ref(y). Next, HE can be applied to the score distribution of each modality as follows:

1. Determine the maximum (xmax) and minimum (xmin) values across the whole set of ob-servations —scores, in this case— of a particular modality.

2. Divide the range [x_min, x_max] into M equally-spaced non-overlapping intervals B_isatisfying the following conditions:

x_min = b₁ < b₂ < ... < b_{M +1}

B_i = [b_i, b_i+1) (4.7)

3. Construct a histogram of the scores in the set using the intervals Bi; i.e., scan the set and count the number of scores (observations) falling into each interval Bi.

4. Compute the normalised version of the histogram obtained by using the following expres-sion:

px(x ∈ Bi) = ni

N_x (4.8)

82 A Voice and Face Biometric System using HE-SVM

being ni the number of observations in the interval Bi, and Nx the total number of obser-vations in the set. In fact, Equation 4.8 is an approximation of the probability of x lying in the interval B_i.

5. Compute the cumulative histogram of the set, using the normalised histogram constructed in the previous step, as follows:

Cx(x : x ∈ Bi) =

j=1

n_j Nx

(4.9)

which is an approximation of the true cumulative distribution function.

6. Replace each value of x by the value of y that corresponds to the same point in the reference and computed cumulative histograms, so that C_x(x) = C_ref(y).

In document Fusing prosodic and acoustic information for speaker recognition (Page 98-101)