A SIMPLIFIED AND EFFICIENT EPILEPSY CLASSIFICATION TECHNIQUE FROM EEG SIGNALS USING PCA

(1)

A SIMPLIFIED AND EFFICIENT EPILEPSY CLASSIFICATION

TECHNIQUE FROM EEG SIGNALS USING PCA

Harikumar Rajaguru

Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam, India E-Mail: [email protected]

ABSTRACT

Epilepsy causes rapid and revertible changes in the functions of the brain due to the constant occurrence of recurrent seizures. For the epileptic detection and classification, Electroencephalography (EEG) signals are used as this can relate the functions related to the activities of the brain. This paper presents the performance analysis of Approximate Entropy (ApEn) as a Feature Extraction Technique and Fuzzy Mutual Information (FMI), Linear Graph Embedding (LGE) as Dimensionality Reduction Techniques followed by the Application of Principal Component Analysis (PCA) as a Post Classifier for the Classification of Epilepsy Risk Levels from EEG signals. The benchmark parameters used for the analysis here are Performance Index (PI), Quality Values (QV), Specificity, Sensitivity, Time Delay and Accuracy.

Keywords: EEG, epilepsy, ApEn, FMI, LGE.

INTRODUCTION

Epilepsy is a very common neurological condition which affects the people belonging to any age and race [1]. Due to the excessive firing or synchronization of the neurons in the brain, epileptic seizures usually involve [2]. Epileptic seizures always disturb the normal working mechanisms of the cortical regions of the brain [3]. Unexpected sudden death occurs only for a very few people who are suffering from epilepsy. People are at a very high risk for those who suffer from frequent seizures which are severe in nature. To reduce the risk of sudden death, it is important to prevent, detect and classify the seizures to a great extent. For the detection of epilepsy, EEG is employed because it can relate to the brain’s electrical activity in a suitable manner [4]. After proving its prowess as a well established clinical procedure, EEG can provide the most valuable data which helps in the easy analysis and diagnosis of the disorders related to the brain [5]. The abnormal pattern of EEG signal varies continuously for an epileptic patient and hence the recognition of epileptic seizures is quite a technological challenge and a hectic task [6]. Ambulatory reading is the most commonly utilized technique to record the EEG signals for a very long duration even leading up to two weeks or more [7]. The primary drawback associated with recording is that an equal amount of effort by an expert is required to analyse the entire data to find any traces of epilepsy are present or not. Such conventional analysis techniques are very time-prolonging and hence in recent years, the automatic epileptic detection of EEG systems has come into existence [8]. For the automated classification of the EEG signals, a large number of complications are present including the artifacts of environment also.

The organization of the paper is as follows: In section 2, the materials and methods are discussed followed by the dimensionality reduction techniques and feature extraction techniques in section 3. In section 4, the PCA which is used as post classifier for the classification

of epilepsy risk levels from EEG signals are discussed followed by results and conclusion in section 5.

Figure-1. Block diagram of the procedure. MATERIALS AND METHODS

For the performance analysis of the epilepsy risk levels using ApEn as a feature extraction technique and FMI, LGE as dimensionality reduction techniques followed by PCA as Post Classifiers, the raw EEG data of 20 epileptic patients who were under treatment in the Neurology Department of Sri Ramakrishna Hospital, Coimbatore in European Data Format (EDF) are taken for exhaustive analysis and study here. Initially the pre processing stage of the EEG signals is given more attention because it is vital to use the best available technique in literature to extract all the useful information embedded in the non-stationary biomedical signals. The EEG records which were obtained were continuous for about 45 minutes and each of them was divided into epochs of two second duration. In General contexts, a two second epoch is long enough to avoid unnecessary

Raw EEG Signals

Samples

ApEn (FE technique) & FMI and LGE (DR techniques)

PCA as Post Classifier

Benchmark Parameters

(2)

redundancy in the signal and it is long enough to detect any significant changes in activity and to detect the presence of artifacts in the signal. For each and every patient, the total number of channels is 16 and it is over three epochs. The frequency is considered to be 50 Hz and the sampling frequency is considered to be about 200 Hz. Each and every sample corresponds to the instantaneous amplitude values of the signal which totals to 400 values for an epoch. The total number of artifacts present in the data is four. Chewing artifact, motion artifact, eye blink and electromyography (EMG) are the four numbers of artifacts present and approximately the percentage of data which are artifacts is 1%. No attempts were made to select certain number of artifacts which are of more specific nature. The main objective to include artifacts is to differentiate the spike categories of waveforms from non spike categories. The Figure-1 shows the block diagram of the procedure.

FEATURE EXTRACTION AND DIMENSIONALITY REDUCTION TECHNIQUES

The main important task in pattern recognition system is to trace the informative subset of features [9]. The subset of features can be small but it always holds a good discriminatory power [10]. A low dimensional feature representation is important only because of its enhanced discriminatory power. Feature extraction finds a block of significant measurements so that the events present in a particular signal can be detected, traced and analyzed.

The dimensions of the EEG data are stored by a pre-processing step known as Dimensionality Reduction (DR) [11]. By separating a set of important features that goes hand in hand with certain important criteria, the dimensions of the data can be reduced. The impact of the reduced dimensions has a vital effect to play in the classification process. Each epoch contains 400 values and hence the total volume for a patient is around 25,600 samples. So it absolutely necessary to reduce the dimensions of the data for smooth processing of the EEG signals. In a high-dimensional data set, it is important to understand that not all the obtained variables by appropriate measurements are utilized for analyzing the underlying area of interest.

ApEn as Feature Extraction Technique

The complexity and irregularity of the signal can be easily quantified and measured with the help of ApEn [12]. The algorithm is as follows:

The data points are assumed as

L

and so the sequence is considered as follows:

)

(

)...

3 (

),

2 (

),

1 (

d

L

d

D =

Assuming

d

(i

)

is a sub sequence of

D

such that

)]

1 (

)...

1 (

),

(

[

)

(

i

=

d

i

d

i

+

d

i

+

m

−

d

where

m

is representing the samples in between the sub sequence. The distance

q

(

d

_i

,

d

_j

)

between the two

corresponding vectors is measured. If the pre-defined threshold is considered as

t

, then the similarity of patterns is computed as follows:

1 )

(

)

(

+

−

=

m

N

i

N

i

C

m m t (1)

where

N

m

(i

)

is the count. The natural logarithm of

)

(i

C

tm is found out and averaged for all values of

i

and is

expressed mathematically as follows:



− + =

+

−

=

1 1

)

(

ln

1

1 )

(

m N i m t m

i

C

m

N

t



(2)

Finally the Approximate Entropy is calculated as follows:

)

(

)

(

r

1

r

ApEn

=



m

−



m− (3)

Fuzzy MI as Dimensionality Reduction Technique

Instead of discretizing the data which is a continuous form, fuzzifying the data before the computation of mutual information seems to be a versatile measure [13].

The Fuzzy MI is defines as follows:



=

−

=

n i L i L

n

R

x

n

R

FH

1

]

[

log

1 )

(

(4) where



=

n i ij L i

R

r

x

1

]

[

(5)

If two subsets

L

,

L

₁ and

L

₃ are given, the fuzzy joint information entropy is given as follows:

 

1 2 1 ] [ ] [ log 1 ) ( ) . ( 1 2 , 1 2 1 F i F i F i n i L L x x x n R R FH L L FH = =−



 = (6)

The Fuzzy MI between

L

₁ and

L

₂is denoted as follows:

)

(

)

(

)

(

)

;

(

F

₁

F

₂

FH

F

₁

FH

F

₂

FH

F

₁_,

F

₂

FMI

=

+

−

(7)

LGE as Dimensionality Reduction Technique

The graph embedding involves three important processes namely Linearization, Kernelization and Tensorization processes respectively. The main aim of linear graph embedding is to characterize certain geometrical and statistical properties of the data set [14]. The block diagram of LGE technique is shown in the Figure-2.

(3)

PCA AS A POST CLASSIFIER

The main objective of PCA is for the identification of new and meaningful variables so that the classification would be made much simpler [15].

Figure-2. Block diagram of LGE technique.

For the conversion of correlated variables into uncorrelated linear variables, PCA is used widely by means of an orthogonal transformation. The original variables generally exceed the obtained principal

component values. On taking the orthogonal

transformations, the first principal component has a high variance. The variance values obtained here must be orthogonal to the preceding components.

Assuming a data set is represented as a matrix format, say matrix, where the samples are represented in columns and the variables are represented in rows. Now the linear transformation of this matrix into another new matrix, has to be done noting that even should have the same dimension. Now, for any matrix,

LX

Z =

(8)

The above equation represents the change in basis. If the rows of

' L

'

are considered as row vectors

m

l

₁_, ₂

,...

and the columns of

X

are considered as column vectors

x

₁

,

x

₂

...

x

_n, then

X

is being projected on the columns of

L

. Therefore, for the representation of columns of

X

, a new basis of rows

L

is formed as

)

,....,

,

(

l

₁

l

₂

l

_m

L

. The rows of

L

now becomes the

principal component direction.

RESULTS AND CONCLUSIONS

For ApEn as feature extraction technique and FMI, LGE as Dimensionality Reduction techniques and PCA as Post Classifiers, based on the Quality values, Time Delay and Accuracy the results are computed in Table-1

for a single epoch only. The formulae for the Performance Index (PI), Sensitivity, Specificity and Accuracy are given as follows:

100 













−

=

PC

FA

MC

PC

PI

(9)

where PC - Perfect Classification, MC - Missed Classification, FA - False Alarm,

The Sensitivity, Specificity and Accuracy measures are stated by the following:

100 

+

=

FA

PC

y

Sensitivit

(10)

100 

+

=

MC

PC

y

Specificit

(11)

2 y

Specificit

y

Sensitivit

Accuracy

=

+

(12) The Quality Value QV is mathematically defined

as follows:

)

*

6 *

(

*

)

2 .

0 (

_fa _dly _dct _msd v

P

T

R

C

Q

+

=

(13)

where C expresses the scaling constant,

Rfa specifies the number of false alarm per set,

Tdly explains the average delay of the onset

classification in seconds

Pdct mentions the percentage of perfect classification

and

Pmsd specifies the percentage of perfect risk level

missed

The time delay is mathematically expressed as follows:









_

₊

_

=

100

6

100

2 PC

MC

Delay

Time

(14)

The Specificity and Sensitivity average Analysis for a single epoch is shown in Figure-3. The Time Delay and Quality Value average Measures for a single epoch is shown in Figures 4 respectively. The Performance Index and Accuracy average Analysis for a single epoch is shown in Figure-5.

Graph Embedding:

Linearization:

(4)

Figure-3. Specificity and sensitivity measures.

From the Figure-3, it is inferred that the specificity measures are not constant throughout the series. For ApEn with PCA, a specificity of about 91.25% is achieved, for FMI with PCA, a specificity of about 93.33% is achieved and finally for LGE with PCA a specificity of about 86.25% is achieved.

Figure-4. Time delay and quality value measures.

From the Figure-4, the time delay and Quality Measures are easily analyzed. It is inferred that there is a

less time delay of 2.15 seconds when FMI with PCA is engaged when compared to the other two methods.

Figure-5. Performance index and accuracy measures.

From Figure-5, the performance index and accuracy measures are easily analyzed. The performance index measures are almost constant for the all the techniques and a high performance index is found for FMI with PCA technique.

0 20 40 60 80 100 120 8 3 .3 3 3 3 3 3 3 3 8 3 .3 3 3 3 3 3 3 3 7 9 .1 6 6 6 6 6 6 7 9 1 .6 6 6 6 6 6 6 7 1 0 0 9 1 .6 6 6 6 6 6 6 7 7 9 .1 6 6 6 6 6 6 7 1 0 0 1 0 0 1 0 0 8 3 .3 3 3 3 3 3 3 3 1 0 0 1 0 0 ₇₅ 9 5 .8 3 3 3 3 3 3 3 7 9 .1 6 6 6 6 6 6 7 1 0 0 1 0 0 1 0 0 8 3 .3 3 3 3 3 3 3 3

Approximate Entropy with PCA Fuzzy MI with PCA LGE with PCA

0 5 10 15 20 25

0 20 40 60 80 100 120 80 80 73. 6842 1053 90. 9090 9091 73. 6842 1053 90. 9090 9091 73. 6842 1053 73. 6842 1053 58. 8235 2941 73. 6842 1053 80 90. 9090 9091 90. 9090 9091 66. 6666 6667 95. 6521 7391 73 .68 42 10 53 58. 8235 2941 73. 6842 1053 90. 9090 9091 80

(5)

Table-1. Performance comparison table. Parameters ApEn with

PCA FMI with PCA LGE with PCA

PI (%) 78.51 84.93 79.22

Sensitivity (%) 91.66 94.37 98.12

Specificity (%) 91.25 93.33 86.25

Time Delay (sec) 2.18 2.15 2.512

Quality Value 17.94 19.39 19.16

PC (%) 82.91 8.75 8.33

MC (%) 87.7 6.66 5.62

FA (%) 84.37 13.75 1.87

Accuracy (%) 91.45 93.85 92.18

It is concluded from Table-1 that FMI as a dimensionality reduction followed by the PCA as a Post Classifier yields a very good result when compared to the other two techniques. A low time delay as of 2.15 seconds, a high performance index of 84.93%, a high quality value of about 19.39 and a good accuracy rate as of 93.85 % makes the FMI with PCA classifier as the best technique when compared to the other two techniques. Future work may incorporate the possible usage of different dimensionality reduction techniques followed by various neural networks as post classifiers for the classification of epilepsy risk levels from EEG signals.

REFERENCES

[1] S. K. Prabhakar, H. Rajaguru. 2016. Entropy Based PAPR Reduction for STTC System Utilized for Classification of Epilepsy from EEG Signals Using PSD and SVM. IFBME Proceedings (Springer), 3rd International Conference on Movement, Health and Exercise (MoHE), September 28-30, Malaysia. [2] R. Harikumar, P.S. Kumar. 2015. Fuzzy Techniques

and Aggregation Operators in Classification of Epilepsy Risk Levels for Diabetic Patients Using EEG Signals and Cerebral Blood Flow. Journal of Biomaterials and Tissue Engineering. 5(4): 316-322. [3] H. Rajaguru, S.K. Prabhakar. Non Linear ICA and

Logistic Regression for Classification of Epilepsy from EEG signals. IEEE Proceedings of the

International Conference on Electronics,

Communication and Aerospace Technology (ICECA 2017), Coimbatore, India. pp. 577-580

[4] H. Rajaguru, S.K. Prabhakar. 2016. A Unique Approach to Epilepsy Classification from EEG Signals Using Dimensionality Reduction and Neural Networks. Circuits and Systems. 7: 1455-1464

[5] H. Rajaguru, S.K. Prabhakar. 2016. A Framework for Epilepsy Classification Using Modified Sparse Representation Classifiers and Native Bayesian Classifier from EEG Signals. Journal of Medical Imaging and Health Informatics. 6(8): 1829-1837. [6] R. Harikumar, P.S. Kumar. 2015. Fuzzy Techniques

and Aggregation Operators in Classification of Epilepsy Risk Levels for Diabetic Patients Using EEG Signals and Cerebral Blood Flow. Journal of Biomaterials and Tissue Engineering. 5(4): 316-322. [7] H. Rajaguru, S.K. Prabhakar. Epilepsy Classification

through Multi-Label Dimensionality Reduction

through Dependence Maximization and Elite Genetic Algorithm. IEEE Proceedings of the International Conference on Electronics, Communication and Aerospace Technology (ICECA 2017), Coimbatore, India. pp. 594-597.

[8] H. Rajaguru, S.K. Prabhakar. 2017. Modified

Expectation Maximization Based Sparse

Representation Classifier for Classification of Epilepsy from EEG Signals. IEEE Proceedings of the

International Conference on Electronics,

Communication and Aerospace Technology (ICECA 2017), Coimbatore, India. pp. 607-610.

[9] H. Rajaguru, S. K. Prabhakar. 2016. A Weighted KNN Measures for Epilepsy Classification from EEG signals utilized in Telemedicine Applications with a PSO Based Reduced PAPR and BER Analysis. J.Pharm. Sci. & Res. 8(4).

[10] H.Rajaguru, S.K.Prabhakar. 2016. LDA, GA and SVM’s for Classification of Epilepsy from EEG

Signals. Research Journal of Pharmaceutical,

(6)

[11] R. Harikumar, P. S. Kumar. 2015. Dimensionality Reduction Techniques for Processing Epileptic

Encephalographic Signals. Biomedical and

Pharmacology Journal. 8(1): 103-106.

[12] H. Ocak 2009. Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Syst. Appl. 36(2): 2027-2036.

[13] L. Sanchez. 2005. A fuzzy definition of Mutual Information with application to the design of Genetic Fuzzy Classifiers. International Conference on Machine Intelligence, Tozeur, Tunisia, November 5-7.

[14] S. K. Prabhakar, H. Rajaguru. 2015. Application of Linear Graph Embedding as a Dimensionality Reduction Technique and Sparse Representation Classifier as a Post Classifier for the Classification of Epilepsy Risk Levels from EEG Signals. Proceedings of the International Conference on Graphic and Image Processing (ICGIP), October 23-25, Singapore. [15] M Kheli et.al. 2005. Classification of Defects by the

SVM Method and the Principal Component Analysis (PCA). World academy of science, Engineering and Technology. 9: 226-231.