Identifying Concealed Information Using Wavelet Feature Extraction and Support Vector Machine

(1)

Procedia Environmental Sciences 8 (2011) 337 – 343

doi:10.1016/j.proenv.2011.10.053

Available online at www.sciencedirect.com

ICESB 2011: 25-26 November 2011, Maldives

Identifying Concealed Information Using Wavelet Feature

Extraction and Support Vector Machine

Min Zhao

a,b∗

, Chunlin Zhao

a,b

,Chongxun Zheng

a

a_{Biomedical Information Engineering Institution,Xi’an Jiaotong University, Xi’an,710049, China} b_{Communication Engineering Department, Army Police Engineering College, Xi’an , 710086,China}

Abstract

In this paper, a new approach based on wavelet feature extraction and support vector machine (SVM) is proposed to identify concealed information. Firstly, the wavelet coefficients of event related potential (ERP) in delta, theta, alpha and beta bands are extracted as useful features of brain activity responded to different stimulus information. Next, a Fisher discriminant criterion is applied to reduce the feature vector dimensions. Finally, a SVM classifier is employed to classify the data and the leave-one-out cross validation method is used for accuracy assessment. For the evaluation of the method, 16 subjects went through the designed CIT paradigm and their respective brain signals were recorded. The experimental results show that SVM classifier can effectively differentiate between concealed information and irrelevant information, and it achieves the maximum classification accuracy of 90.63%. The investigation also suggests that the wavelet decomposition coefficient can reflect more comprehensive time-frequency information correlating with deception, which can effectively distinguish concealed information between irrelevant information.

Keywords: Lie detection; Event related potential; Wavelet Transform; Support vector machine;

1.Introduction

The identification of concealed information or lie detection, has great significance for the criminal investigation, antiterrorism, security protect and clinical implication et al. It has become a very important and quite necessary technique to many fields. Currently, the most widely used method for the quantitative discrimination between deceptive and truthful response is polygraph, which relies on measures of autonomic nervous system response. However, the limitations of the specificity of polygraph promoted the exploration of alternative methods based on measures of central nervous system activity, such as

∗_{Corresponding author. Tel.: +86-13259427130} E-mail address: [email protected].

Selection and/or peer-review under responsibility of the Asia-Pacific Chemical, Biological & Environmental Engineering Society (APCBEES)

Open access under CC BY-NC-ND license.

Selection and/or peer-review under responsibility of the Asia-Pacific Chemical, Biological & Environmental Engineering Society (APCBEES)

(2)

electroencephalogram (EEG) and functional magnetic resonance imaging (fMIR).

Until recently, the reported studies in deception detection have primarily focused on time-domain analysis of the event related potentials (ERPs)[1-3],where only the amplitudes and latencies of some prominent peaks and valleys are taken into consideration. In general, the time domain and frequency domain representations of the ERPs have given complementary information on the functional component structure. Therefore, efficient algorithms for analyzing the signal in time-frequency plane are extremely important in extracting and relating distinct functional components. Wavelet analysis, ‘particularly the multiresolution representation, respects the overlapping component of composition ERPs, providing a natural way to partition them among several orthogonal functions with parallel time courses at different time scales’[4]and enables one to capture the time-dependent frequency-related information in ERPs[5].

This study provides a preliminary investigation of the ability to use time-frequency features of ERP extracted by wavelet transform to differentiate concealed information from irrelevant information with a typical concealed information test (CIT) paradigm[5].

2.Materials and Method

2.1.Participants

Sixteen subjects (average age: 23; 12 male) participated voluntarily in the experiment. They were recruited by means of school Bulletin Board System. They all were university students and none of them has any neural disease. On the arrival for the experimental session, all participants signed consent forms indicting that participation was voluntary and that they could withdraw from the experiment at any time. An approval from the Ethical Committee was granted for this study.

2.2.Experiment Design and Procedure

Each subject participated in two blocks tests which, in one block, tested for the recognition of one of the following two stimulus types: subject’s name and subject’s birth date. There were about five minute’s delays between blocks, so that we could explain to participants that a new block was to begin and the subjects could rest a moment. In a typical CIT paradigm [6], information items tested were called probe items. Four irrelevant and one target items were from the same categories as probes within a block. Thus, for the name block, probes, targets and irrelevant were respectively subject’s name and other five stranger’s name, for the birth data block, probes, targets and irrelevant were respectively subject’s birth date and five irrelevant date in the form of 9th of May. Within a block, stimuli were in random order, but with no repetitions of a stimulus in two consecutive trials, and there were 120 irrelevant trials, 30 probe trials and 30 target trials. Visual stimuli were presented once per 3s for a 1s duration on a display screen about 70 cm from the subject’s eye. Subjects were told to pay attention to the display screen, and press a “yes” button to the assigned target items, indicating target recognition, but a “no” button to all other stimuli, indicating non-recognition. In each block, targets were assigned via the experimenter’s explicit instruction before the running block.

2.3.Data acquisition

EEG signals were recorded from Pz electrode following the international 10-20 Montage system using an electrode cap with Ag/AgCI electrodes and referred to linked earlobes with a forehead ground. Additional electrodes at supra- and infra-orbital sites surrounding the left eye were used to monitor eye blinks and vertical eye movement (bipolar), and electrodes at right and left outer canthi monitored

(3)

horizontal eye movements (bipolar). All electrode impedances were kept below 5kȍ. The signals were recorded through a Grass Neurodata acquisition system at a gain of 10k (5k and 2k for horizontal and vertical eye channels), with a bandpass of 0.05-70Hz and a notch filter of 50Hz. A PC-based EEG acquisition system (Neuroscan) was used to continuously sample the data at 1000Hz during the task. Recording epochs of 2000ms (500ms pre-stimulus and 1500ms post-stimulus) were extracted off-line. Ocular artifacts were corrected on a trial-by-trial basis. Epochs contaminated by residual blinks, lateral eye movement, muscle activity, or movement-related artifacts were excluded from analysis by means of a rejection criterion of ±70uV on any channel. ERP waveforms were obtained by averaging EEG artifact-free epoch for each type stimulus for each subject.

2.4.Feature extraction based on wavelet transform

The wavelet transform (WT) gives a time-frequency representation of a signal that has two main advantages over previous methods: (a) optimal resolution even in the time and frequency domains; (b) lack of the requirement of stationary of the signal. It is defined as the convolution between the signal

x

(t

)

and the wavelet functions ψa,b(t)_[7].

)

(

)

(

)

,

(

a

b

x

t

_,

t

X

W

_ψ

=

_ψ

_a_b (1) where

ψ

_a_,_b

(

t

)

are dilated ( contracted) and shifted versions of a unique wavelet function

ψ

(t

)

/

)

((

)

(

12 ,b

t

a

t

b

a

=

−

ψ

(2) (a, b are the scale and translation parameters, respectively). The WT gives a decomposition of x(t) in different scales, tending to be maximum at those scales and time locations where the wavelet best resembles x(t). Moreover, Eq.(1) can be inverted, thus giving the reconstruction x(t).

The WT maps a signal of one independent variable t onto a function of two independent variables a,b. This procedure is redundant and not efficient for algorithm implementations. In consequence, it is more practical to define the wavelet transform only at discrete scales a and discrete times b by choosing the set of parameters

{

a

_j

=

2

−j

;

b

_jk

=

2

−j

k

},

with integers j,k.

Contracted vesions of the wavelet function will match the high frequency components of the original signal and on the other hand, the dilated versions will match low frequency oscillations. Then, by correlating the original signal with wavelet functions of different sizes we can obtain the details of the signal at different scales. These correlations with different wavelet functions can be arranged in a hierarchical scheme called multiresolution decomposition. The multiresolution decomposition separates the signal into ‘details’ at different scales, the remaining part being a coarser representation of the signal called ‘approximation’. Moreover, it was shown that each detail

D

_jand approximation signal

A

_jcan be obtained form the previous approximation

A

_j₋₁ via a convolution with high-pass and low-pass filters, respectively.

In this study, a seven-level wavelet decomposition with a Daubechies order 4 was applied on the ERP signal and only detail coefficients D7, D6, D5 and approximation coefficients A7 are extracted as feature

parameters, which respectively corresponded to theta (4-7Hz), alpha (8-15Hz), beta (16-31Hz), and delta(0.5-4Hz) rhythms.

2.5.Feature reduction

The aim of this stage is to reduce the dimension of the feature vectors and at the same time to improve the classification accuracy by using the Fisher’s criterion which is defined as fellows[8]:

(4)

)

/(

)

(

2 2 2 1 2 2 1

−

σ

+

σ

=

u

FDR

(3) where

u

_i and

σ

_i represent the mean and standard deviation of each class, respectively. Let us assume that there are L different classes of C1, C2, …,CL for which the number of data point are N1, N2,…,NL

respectively. In order to use the Fisher’s criterion, the so-called “ within-class scatter matrix” and “between-class scatter matrix” are defined as follows:

¦

=

L k k w

S

WCS

1 ₍₄₎ t k k L k k b

N

u

S

BCS

(

)(

)

1

−

=

_¦

= (5) where t k C x k k

x

u

x

u

S

k

)

)(

(

−

=

_¦

∈ ₍₆₎

In (6),

x

is the feature vector and

u

is the mean of all data points.

Now the Fisher’s criterion can be defined as

S

_w−1

S

_b_{which is called the separability matrix. The trace} of corresponding columns in the eigenvector matrix, the linear transform matrix and the new dimension-reduced feature vectors are obtained. It can be readily shown that the discarded eigenvalues do not have a significant role in the trace of separability matrix. The remaining eigenvalues and their corresponding eigenvectors would serve as a linear transform of the feature matrix.

2.6.Support vector machine classfier

Support vector machine (SVM) has been used widely in EEG researches since it is a powerful approach for pattern recognition. The goal of SVM classifier is to search a hyperplane to separate the data representing the different classes while maximizing the distance between the two classes. For a linearly separable binary classification problem, the construction of a hyperplane

w

T

x

+ b

=

0

_{so than the} margin between the hyperplane and the nearest point is maximized can be posed as the following quadratic optimization problem [9].

2 /

)

(

min

w

T

w

w (7) subject to

N

i

b

x

w

d

i

₍₍

T j

₎

₊

₎

_≥

₁

_,

₌

₁

_,...,

₍₈₎

where

_d

i

∈

{

−

1 ,

1 }

_{stands for the ith desired output,}

_x

i

_R

p

∈

stands for the ith input sample of the training data set

{

x

i

,

d

i

}

_iN₋₁. Equation (8) forces a rescaling on (w, b) so that the point closest to the hyperplane has a distance of

1 w

[15]. Maximizing the margin corresponds to minimizing the Euclidean norm of weight vector. Often in practice, a separating hyperplane does not exist. Hence the constraint (8) is relaxed by introducing slack variables

ξ

_i

≥

0 ,

i

=

1 ,...,

N

, the optimization problem now becomes as follows:

¦

=

+

N i i T w

w

C

1 ,

(

)

/

2 min

ξ

ξ (9) subject to i j T i

_w

_x

_b

d

((

)

+

)

≥

1 −

_ξ

(10)

(5)

C

controls the tradeoff between robustness of the machine and the number of non-separable points.

)

,

sgn(

)

(

1

b

x

d

x

f

N i i i i

₊

=

_¦

=

α

(11) By introducing Lagrange multipliers

α

_i and using the Karush-Kuhn-Tucker theorem of optimization theory, the decision function, for the vector x, then equation (11) is obtained [7]:

By replacing the inner product

_x

,

_x

i

₌

(

_x

T

)(

_x

i

)

_{with kernel function}

_K

₍

_x

_,

_x

i

₎

_{, the input data are}

mapped to a higher dimensional space. It is then in this higher dimensional space that a separating hyperplane is constructed to maximize the margin. In this study, a SVM with a Gaussian kernel was implemented.

3.Result and discussion

3.1.Support vector machine classfier

Fig.1 shows grand average ERP waveforms of all subjects and the wavelet decomposition coefficients and the reconstruction signals in beta, alpha, theta and delta frequency ranges. The grand average ERP of probe and target stimulus responses contained a P300 wave of a larger amplitude compared with the grand average of the irrelevant ones, which is in accordance with the previous results[1-3]. Additionally, increased amplitudes of the delta response and theta response are obviously observed in the probe response compared with the irrelevant responseˈwhich is in accord with previous results[10,11].

Fig. 1. Grand average ERP waveforms of 16 subjects and wavelet coefficients in four frequency ranges and the reconstruction signals responding to irrelevant, probe and target stimulus responses

(6)

Fig.1 also shows that the obvious difference between irrelevant and probe stimulus mainly appears in 0-1000ms time range after stimulus time point. Therefore, the ERP in time window of 0-1024ms post-stimulus was selected to wavelet decomposition, in which total of 64 wavelet coefficients in D7 (theta,8), D6 (alpha,16), D5 (beta,32) and A7 (delta,8) bands are used as feature parameters to distinguish concealed information from irrelevant information. Repeated measure analysis of variable (ANOVA) was run to these data to confirm main effect of stimulus types (probe/irrelevant). The most significant main effect occurs 4th and 5th coefficient after stimulus time point in delta band (p<0.001), which corresponds to about 384-640ms. This result is consistent with the conclusion of ERP P300 because previous studies have verified that delta response contributes to the amplitude at the P300 latency [10].

3.2.Classification results

SVM classifier is used to realize identification of two classes of ERP patters according to feature parameters. Moreover, we used Fisher’s criterion to reduce the dimension of the feature vectors and at the same time to improve the classification accuracy.Total of 32 sample datasets are classified. The performance of the classifier is evaluated by using the leave-one-out cross validation test. In each trial, the data from all but one (S-1 of the S sample) to train the classifier, then classifier tested the remaining one. This procedure was repeated S times, each time leaving out a different sample.

To quantify the result, accuracy (Ac), sensitivity (Se) and specificity (Sp) are calculated, and defined as [10]:

Total

Tn

Tp

Ac

₌

100 %

_×

˄

₊

)

/

(12)

)

/(

%

100 Tp

Tp

Tn

Se

=

×

+

(13)

)

/(

%

100 Tn

Tn

Fp

Sp

₌

_×

₊

(14) The classification results are shown in Fig.2. It can be clearly seen that SVM combined with Fisher’s criterion for feature optimization obtain the satisfactory classification results, in which three indexes i.e. accuracy, sensitivity and specificity are much better of that of non-pre-selecting features extracted by only wavelet transform. And the maximum accuracy of 90.63 is obtained when the dimension of feature vector is reduced to 15.

(7)

4.Conclusion

The aim of this study was to investigate the ability of time-frequency domain feature of ERP to differentiate concealed information from irrelevant information with a CIT experiment paradigm. For that, wavelet transform was adopt to extract wavelet coefficient feature parameters in delta, theta, alpha and beta band, and a Fisher’s criterion was used to optimize the feature parameters, finally, a SVM classifier based on optimized feature parameters was used to identify concealed information from irrelevant information. The results revealed that wavelet coefficients of ERP are strongly correlated with concealed information and SVM classifier combined with optimized feature parameters by Fisher’s criterion reaches satisfactory result. These results suggest that proposed methods in this study could potentially be used as an effective approach to identify concealed information.

Acknowledgements

The project is supported by National Science Foundation of China under grant No.30870654.

References

[1] Rosenfeld JP, Shue E, Singer E, “Single versus multiple probe blocks of P300-based concealed information tests for self-referring versus incidentally obtained information,” Biol Psychol 2007; 73:396-404.

[2] Rosenfeld JP, Biroschak JR, Furedy JJ, “P300-based detection of concealed autobiographical versus incidentally acquired information in target and non-target paradigms,” Int J Psychophysiol 2006; 60:251-259.

[3] Lefebvre CD, Marchand Y, Smith SM, Connolly JF, “Use of event-related brain potentials (ERPs) to assess eyewitness accuracy and deception”, Int J Psychophysiol 2009;73:218-225.

[4] Demiralp T, Ademoglu A, Istefanopulos Y, Basar-Eroglu C , Basar E .” Wavelet analysis of oddball P300”, Int J Psychophysiol 2001; 39: 221-227.

[5] Basar E, Schurmann M, Demiralp T, Basar-Eroglu C, Ademoglu A. “Event-related oscillations are 'real brain responses' -- wavelet analysis and new strategies”, Int J Psychophysiol 2001; 39: 91-127.

[6] Ben-shakhar G, Elaad E. “The guilty knowledge test(GKT) as an application of psychophysiology: future prospects and obstacles”, In: Kleiner M (Eds.), Handbook of Polygraph Testing. London :Academic Press; 2002, p.87-102.

[7] Mallat S , A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. Pattern Anal, Machine Intell 1989; 2: 298-302.

[8] Xu Y, Lu GM. “Analysis on fisher discriminant criterion and linear separability of feature space,” International Conference on Computational Intelligence and Security. Piscataway, NJ, USA:IEEE, 2007˖1671-1676.

[9] Haykin S, “Neural network- A comprehensive foundation”, 2nd ed. Prentice Hall, New Jersey: Englewood Cliffs;1999. [10] Klimesch W, Doppelmayr M, Schwaiger J, Winkler T, Gruber W, “Theta oscillations and the ERP old/new effect: independent phenomena?”, Clinical Neurophysiology 2000; 111:81-793.

[11] Karakas S, Erzengin OU, BasarE. “The genesis of human event-related responses explained through the theory of oscillatory neural assemblies”, Neuroscience Letters 2000; 285: 45-48.