• No results found

Integration of multiple cues for a human authentication system

N/A
N/A
Protected

Academic year: 2021

Share "Integration of multiple cues for a human authentication system"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Science

Procedia Computer Science 00 (2010) 000–000

www.elsevier.com/locate/procedia

ICEBT 2010

Integration of Multiple Cues for a Human Authentication

System

Mahesh P.K.*, M.N. Shanmukha Swamy

Department of Electronics and Communication, J.S.S. research foundation, Mysore University, Mysore-6,India Abstract

In the field of biometrics, personal authentication and identification are regarded as effective methods for automatic recognition of a person. Using multimodal biometric systems, we typically get better recognition performance compared to using a single biometric modality. We propose a multimodal biometric system using two modalities: palmprint and speech. Integrating the palmprint and speech features increases the recognition performance. We extract the discriminant features using a modified canonical form method for the Palmprint and the Mel Frequency Cepstral Coefficients (MFCC) technique for speech. The final decision is then made by fusion at the matching score level. Using a large database as the test set, the experimental results show significant improvement in reliability of recognition systems and demonstrate increases in the recognition rate.

"Keywords: Multimodal Biometrics; Palmprint; Speech; Fusion;"

1. Introduction

Multimodal biometric systems consolidate the evidence presented by multiple biometric modalities and typically provide better recognition performance compare to single biometric modality. Due to its promising applications as well as theoretical challenges, multimodal biometric has drawn more and more attention in recent years [1]. Although information fusion in a multimodal system can be performed at various levels, integration at the matching score level is the most common approach due to the case in accessing and combining the score generates by different matchers. Since the matching scores output by the various modalities are heterogeneous, score normalization is needed to transform these scores into a common domain, prior to combining them.

We proposes, multimodal biometric system for identify verification using two modalities, i.e. Palmprint and speech. The proposed system is designed for applications where the training data contains palmprint and speech. Integrating the palmprint and speech features increases recognition performance of person authentication.

The final decision is made by fusion at matching score level. Multimodal system is developed through fusion of palmprint verification and speaker verification. We extract the features using modified canonical form method for p palmprint and MFCC technique for speech. Integrating these two features at fusion level, which gives better performance and better accuracy.

The rest of this paper is organized as fallows. Section 2 presents the System overview, which is used to increase recognition quality. Section 3 presents algorithms for calculation of palmprint and speech features using modified canonical form method and MFCC technique. Section 4, the individual traits are fused at matching score level based on Equal Error Rate(EER) technique. Finally, the experimental results are given in section 5. Conclusions are given in the last section.

* Corresponding author. Tel.: +91-9739364181. E-mail address: [email protected].

c

⃝2010 Published by Elsevier Ltd

Procedia Computer Science 2 (2010) 188–194

www.elsevier.com/locate/procedia

1877-0509 c⃝2010 Published by Elsevier Ltd

doi:10.1016/j.procs.2010.11.024

Open access under CC BY-NC-ND license.

(2)

1

1

n i

I i

N

=

Ψ =

2. System Overview

The block diagram of a multimodal biometric system using two (palm and speech) modalities for human recognition system is shown in Figure 1. It consists of three main blocks, that of Preprocessing, Feature extraction and Fusion. Preprocessing and feature extraction are performed in parallel for the two modalities. The preprocessing of the audio signal under noisy conditions includes signal enhancement, tracking environment and channel noise, feature estimation and smoothing[2]. The preprocessing of the palmprint typically consists of the challenging problems of detecting and tracking of the palm and the important palm features.

The preprocessing should be coupled with the choice and extraction of speech and palm features as depicted in Figure 1. For the palmprint and speech signal, the input image is recognized using Modified Canonical method and MFCC with VQ respectively. The matching score for speech is calculated by Euclidean distance using code book. Similarly the matching score for palmprint is calculated by Euclidean distance. The modules based on individual modality returns an integer value after matching the templates and query feature vectors. The final score is generated by using the matcher weighting based on EER and weighted product technique at fusion level, which is the passed to the decision module. The final decision is made by comparing the final score with a threshold value at the decision module.

Figure 1: Block diagram of the proposed multimodal biometric verification system

3. Algorithms for Palmprint and Speech Signal Feature Extraction

3.1. Modified Canonical Form Method

The “Eigenpalm” method proposed by Turk and Pentland [3][4] is based on Karhunen-Loeve Expression and is motivated by the earlier work of Sirovitch and Kirby [5][6] for efficiently representing picture of images. The Eigen method presented by Turk and Pentland finds the principal components (Karhunen-Loeve Expression) of the image distribution or the eigenvectors of the covariance matrix of the set of images. These eigenvectors can be thought as set of features, which together characterized between images.

Let a image I (x, y) be a two dimensional array of intensity values or a vector of dimension n. Let the training set of images be I1, I2, I3,…….In. The average image of the set is defined by

(1)

Each image differed from the average by the vector.

Scanner Microphone Pre-processing Feature Extraction Matching Palmprint Recognition Pre-processing Feature Extraction Matching Speaker Recognition Palmprint Database Speech Signal Fusion and

(3)

ˆ

P

1

ˆ

ˆ

P

I P

=

D

1

ˆ

ˆ

I

=

P D P

I

I i

φ

=

− Ψ

1

1

N T T i i i

C

A A

N

=

φ φ

=

=

1 1 n n T ij i j i j

Q

C I C

a c c

= −

=

=

∑∑

( )

( )

( )

1 1

ˆ

ˆ

ˆ

ˆ

T T T

Q C I C C PDP C

=

=

=

C P

D P C

[

1

,

2

...

N

]

A

=

φ φ

φ

1 2

[ ( ),

( )... ( )]

n

X

=

a x a x

a x

2 1

( ,

)

( (

)

(

))

n i j r i r j r

d x x

a X

a X

=

=

(2)

This set of very large vectors is subjected to principal component analysis which seeks a set of K orthonormal vectors Vk, K=1,…...., K and their associated eigenvalues λk which best describe the distribution of data. The vectors Vk and scalars λk are the eigenvectors and eigenvalues of the covariance matrix:

(3)

Where the matrix finding the eigenvectors of matrix Cnxn is computationally intensive.

However, the eigenvectors of C can determine by first finding the eigenvectors of much smaller matrix of size NxN and taking a linear combination of the resulting vectors [4].

The modified canonical method proposed in this paper is based on Eigen values and Eigen vectors. These Eigen valves can be thought a set of features which together characterized between images.

Let Q be a quadratic form given by

(4) Therefore “n” set of eigen vectors corresponding “n” eigen values.

Let be the normalized modal matrix of I, the diagonal matrix is given by

(5)

Where Then

(6) The above equation is known as a canonical form or sum of squares form or principal axes form.

The following steps are considered for the feature extraction: • Select the palm image for the input

• Pre-process the image

• Determine the eigen values and eigen vectors of the image • Use the canonical form for the feature extraction. 3.2. Euclidean Distance

Let an arbitrary instance X be described by the feature vector

(7)

Where ar(x) denotes the value of the rth attribute of instance x. Then the distance between two instances xi and xj is defined to

be

d x x

( ,

i j

)

;

(4)

c

S

w w w

t

=

μ αδ

+

0.4

=0.2

w

α

δ

w

δ

w

μ

1,

( )

,

( )

0,

( )

s w s w

W m

t

EPD m

W m

t

=

<

3.3. Algorithm of calculation of Speech features using MFCC

The MFCC cepstrals in speech recognition system is used for calculating speech features. Calculation of the speech features algorithm is defined by several process as in the following form.

Pre-processing. The amplitude spectrum of a speech signal is dominant at “low frequencies. The speech signals is passed through a first-order FIR high pass filter:

in in

( )

s ( )

.s (

1)

p

s n

=

n

α

n

(9)

where

α

is the filter coefficient (

α

(0.95;1)),

s

in(n) is the input signal.

End Point Detection (EPD). This helps to locate the endpoints of an utterance in a speech signal. An inaccurate endpoint detection will decrease the performance of the speech recognizer. Some commonly used measurements for finding speech are short-term energy estimate Es , or short term power estimate Ps , and short term zero crossing

rate Zs .For the speech signals

s n

p

( )

these measures are calculated as follows:

(10) (11) (12) where (13)

these measures calculate the values for each block of L samples. The short term zero crossing rate gives a measure of how many times the signal,

s n

p

( )

, changes sign. This short term zero crossing rate tends to be larger during unvoiced regions. This needs some decision to find the begin and end point. Some assumption is made to remove the background noise, i.e., removing first 5 blocks. To make this a comfortable approach, the following function is used:

(14) where is a scale factor.

The trigger for this function can be described as:

(15)

Where and is the mean and variance. where

The end point detection function, EPD(m) , can be found as:

(16)

By using this function we can detect the endpoints of an utterance. 2 1

( )

( )

m s p n m L

E m

s n

= − +

=

2 1

1

( )

( )

m s p n m L

P m

s n

L

= − +

=

1

sgn(

( )) sgn( (

1))

1

( )

2

m p p s n m L

s n

s n

Z m

L

= − +

=

1,

( )

0,

sgn(

( ))

1,

( )

0.

p p p

s n

s n

s n

⎧⎪

=

<

⎪⎩

( )

( ).(1

( )).

s s s c

W m

=

P m

Z m S

(5)

( )

( ). ( ),

frame p

s

n

=

s n w n

1,

.

.

,

0,1, 2,...,

1,

( )

0,

K r

n

K r

N r

M

w n

Otherwise

< ≤

+

=

=

frame

t

s

f

.

s frame

N

=

f t

2 ( 1) 1

( )

,

0,1, 2,...,

1

N i n k N k w n

bin

s n e

k

N

π − − =

=

=

{ }

{

}

{ }

1

/ 2

,

i s start c start

Mel f

Mel f

f

Mel

Mel f

i

NF

=

+

2595

( )

25951

1

,

700. 10

1 ,

700

mel

x

Mel x

=

g

+

x

=

2

i c i

f

cbin

=

round

N

1, 2, 3,...,

1

i

=

NF

1 1 1

1

1

k k cbin k k i i cbin k k

i cbin

fbank

bin

cbin

cbin

− − − −

+

=

+

+

1 1 1 1

1

,

1

k k cbin k i i cbin k k

i cbin

bin

cbin

cbin

+ + − − −

+

1, 2,...,

1

k

=

NF

ln(

),

1, 2,...,

1

i i

f

=

fbank

i

=

NF

Framing. The input signal is divided into overlapping frames of N samples.

(17)

(18)

where M is the number of frames, is the sampling frequency, is the frame length measured in time, and K is the frame step.

(19)

Windowing. There are different types of window functions to minimize the signal discontinuities. One of the most commonly used for windowing is the Hamming window:

(20)

Calculating of MFCC features:

Fast Fourier transform(FFT): Applying by FFT to windowing frames to calculate spectrum of frames.

(21)

Mel filtering: The low-frequency components of the magnitude spectrum are ignored. The centre frequencies of the channels in terms of FFT bin indices (

cbin

i for the i -th channel) are calculated as follows:

(21)

(22)

(23)

where NF is the number of channels of filter.

The output of the mel filter is the weighted sum of the FFT magnitude spectrum values (

bin

i) in each band. Triangular, half-overlapped windowing is used as follows:

(24) Non-linear transformation:. The output of mel filtering is subjected to a logarithm function (natural logarithm)

(25) Cepstral coefficients: 20 cepstral coefficients are calculated from the output of the non-linear transformation block.

2 (

1)

( )

0.54 0.46cos

( ), 1

1

w frame

n

s n

s

n

n N

N

π

=

≤ ≤

(6)

1 1

.

.cos

(

0.5) ,

1, 2,..., 20

1

NF i i j

i

C

f

j

i

NF

π

− −

=

=

1

1

( )

( )

( ),

1, 2,..., 20

M j j i i

mc q

C q

C q

q

M

=

=

=

1 1 1

,

0

1

m M m m m m E W W E

where

=

=

1 1 ( log( ) (

) ,

log

)

M i M i i i W i i F O F W O = = = ∏ =∑ (26) Cepstral Mean Subtraction (CMS): The channel effect is eliminated by subtracting the mel-cepstrum coefficients with the mean mel-cepstrum coefficients:

(27)

4. Fusion

The different biometric system can be integrated to improve the performance of the verification system. The matcher weights are calculated based on Equal Error Rate(EER) of the intended matchers for fusion. The EER’s are computed using FAR and FRR. EER of matcher m is represented as

E

m,m=1,2,…,M and the weight Wm associated with matcher is computed as

(28)

The final score is calculated by weighted product approach. Logarithms were used to turn it into a weighted sum and being able to use the same type of perception in the coefficients determination.

(29)

(30)

where log( )F is the final matching score.

5. Experimental Results

This section shows the experimental results of our approach with Modified Cannonical method and MFCC method for palmprint and Speech respectively.

We evaluate the proposed multimodal system on a data set including 720 pairs of images from 120 subjects. The training database contains a palmprint images and speech signal for each individual for each subject.

The comparison of both unimodal systems (palm and speech modality) and a bimodal system is given in Table 1 & 2. From the results it is clear that the verification based on the palmprint and verification based on the speech is almost same. It can also be seen that the fusion of palmprint and speech features improves the verification score. The experiments show that EER is reduced to 3.54%.

6. Conclusion

We have developed a prototype of a biometric verification system based on the fusion of palmprint and speech features. The experimental results show that the performance of palmprint-based unimodal system and speech-based unimodal system is almost same. Fusion at the matching-score level is used to improve the performance of the system. The psychological effects of such multimodal system should also not be disregarded and it is likely that a system using multiple modalities would seem harder to cheat to any potential impostors.

In the future we plan to test whether setting the user specific weights to different modalities can be used to improve a system’s performance.

(7)

Table 1. The EER, Accuracy of palmprint & speech

Modality Algorithm used EER Accuracy

Palmprint Modified

Canonical form 14% 85.9%

Speech MFCC 9.82% 90%

Table 2. The EER, Accuracy after fusion

Study Algorithm used EER after

Integrating Accuracy after Integrating Proposed method Palm: Modified Canonical Form Speech: MFCC & VQ 3.54% 96.4% References

[1] Ross, K. Nandakumar, and A. K. Jain. Handbook of Multibiometrics. Springer-Verlag, 2006. [2] Rabiner and B.H.Juang, Fundamentals of speech Recognition. Englewood cliffs, NJ:Prentice Hall,1993

[3] Turk and A. Pentland, “Face Recognition using Eigenfaces”, in Proceeding of International Conference on Pattern Recognition, pp. 591-1991. [4] Turk and A. Pentland, “Face Recognition using Eigenfaces”, Journals of Cognitive Neuroscience, March 1991.

[5] L. Sirovitch and M. Kirby, “Low-dimensional Procedure for the Characterization of Human Faces”, Journals of the Optical Society of America, vol.4, pp. 519-524, March 1987.

[6] Kirby.M, Sirovitch.L. “Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces”, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 12, pp. 103-108, January 1990.

[7] Mikael Nilsson,Marcus Ejnarsson. “Speech Recognition using Hidden Markov Model”.Department of Telecommunications and Speech Processing, Blekinge Institute of Technology. 2002.

[8] Group 11. Tejaswini Hebalkar, Lee Hotraphinyo, Richard Tseng.“Voice Recognition and Identification System”. Digital communications and Signal Processing Systems Design. June 2000.

www.elsevier.com/locate/procedia doi:10.1016/j.procs.2010.11.024 Open access under CC BY-NC-ND license.

References

Related documents

[ 40 ] “Speech Processing Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-end Feature Extraction Algorithm; Compression algorithms,”

Speech recognition in reverberant and noisy environments employing multiple feature extractors and i vector speaker adaptation Alam et al EURASIP Journal on Advances in Signal

Recognized using Euclidean Distance Recognized using Neural Network অ 30 22 22 23 Recognized Speech Reference Speech Speech Unknown Pre- processing Feature

The system goes through various phases such as pre-processing, feature extraction, object recognition, edge detection, image segmentation and Text To Speech (TTS)

Speaker recognition is divided into two main components: Feature extraction and Feature matching which are attached with each other, [1] of them feature extraction is the process

The basic steps involved in human activity recognition system comprises the following stages/steps: gathering data, pre-processing data, feature extraction,

b. Feature Extraction And Optimization : Core part of the system is feature extraction. In pattern recognition and in image processing, feature extraction is a special

Literature Review (Chapter 2) - in this chapter, reviewed literature about the Amharic language, human speech production, speaker recognition systems, feature