EFFECT OF DISCRETIZATION METHOD ON THE DIAGNOSIS OF PARKINSON S DISEASE

(1)

Computing, Information and Control ICIC International c⃝2011 ISSN 1349-4198

Volume7, Number8, August2011 pp. 4669–4678

EFFECT OF DISCRETIZATION METHOD ON THE DIAGNOSIS OF PARKINSON’S DISEASE

Ersin Kaya, O˘guz Findik, ˙Ismail Babao˘glu and Ahmet Arslan

Department of Computer Engineering Faculty of Engineering and Architecture

Sel¸cuk University Sel¸cuklu, Konya 42075, Turkey

{ersinkaya; oguzf; ibabaoglu; ahmetarslan}@selcuk.edu.tr

Received April 2010; revised January 2011

Abstract. Implementing diﬀerent classification methods, this study analyzes the

ef-fect of discretization on the diagnosis of Parkinson’s disease. Entropy-based discretiza-tion method is used as the discretizadiscretiza-tion method, and support vector machines, C4.5, k-nearest neighbors and Na¨ıve Bayes are used as the classification methods. The di-agnosis of Parkinson’s disease is implemented without using any preprocessing method. Afterwards, the Parkinson’s disease dataset is classified after implementing entropy-based discretization on the dataset. Both results are compared, and it is observed that using discretization method increases the success of classification on the diagnosis of Parkin-son’s disease by 4.1% to 12.8%.

Keywords: Parkinson’s disease, Entropy-based discretization method, Classification methods

1. Introduction. Parkinson’s disease is a kind of nervous system disorder which gener-ally arises mostly in men in their 50s. This disease is firstly discovered by James Parkinson, so it was called Parkinson’s disease [1]. The symptoms like poverty of movement, slowness of movement, rigidity and rest tremor are commonly diagnosed in patients with Parkin-son’s disease [2]. Nowadays, no treatment for ParkinParkin-son’s disease is available. However, if the disease is diagnosed at an earlier time, drug treatments mitigating the eﬀects of the symptoms are implemented at clinic environments [3].

Research into this disease shows that sound distortion occurs on 90% of Parkinson’s disease [4,5]. Much research was performed by using voice disorders for the diagnosis of Parkinson’s disease [6]. Little et al. used linear discriminant analysis (LDA) to identify the characteristics of sound data to be used in the diagnosis of the disease. For the diagnosis of Parkinson’s disease, they composed a model using selected properties with support vector machine (SVM) classifier [7].

The data subjected to preprocessing in the classification process increase the perfor-mance of classification [8,9]. Discretization in the data mining is an important preprocess-ing type. Continuous-valued features in dataset are transformed to discrete values with discretization method. Research shows that discretization of continuous values features increases the performance of the classification. Polat et al. studying to diagnose nerve disease showed that when used with traditional methods like artificial neural network (ANN), least squares support vector machines, and C4.5, discretization increases the per-formance of classification [8]. Abraham et al. studying on 28 publicly available medical dataset pointed out the eﬀect of discretization on the success of Na¨ıve Bayes classification [10]. Demsar et al. created a predictive model on data consisting of 69 examples and 174 properties belonging to trauma patients. They used decision tree and Na¨ıve Bayes

(2)

classification method in this model. Positive eﬀects on the success of discretization meth-ods have been shown in this study [11]. Acid et al. introduced a model which evaluates the performance of emergency service of a Spanish hospital by using Bayesian network. In their study, some continuously valued features were transformed into interval-valued features [12].

The data obtained from University of California Irvine (UCI) machine learning reposi-tory housing Parkinson’s disease dataset are used in this study. The continuously valued features in the data are transformed into interval-valued features by discretization method based on entropy. Original data and discretizated data are classified by using Na¨ıve Bayes, C4.5, k-nearest neighbor (k-NN) and SVM classifier methods. The results are compared with each other, and also the eﬀect of discretization on the classification accuracy is shown.

2. Materials and Methods.

2.1. The Parkinson dataset. In this study, the dataset obtained from UCI machine learning repository is used. This dataset is composed of 32 people from both sexes, of them being Parkinson patients. 7 biomedical voice measurements are obtained from S21, S27 and S35 and 6 biomedical voice measurements from the others. The dataset is composed of 195 measurements and 22 features. Detailed analysis of the dataset is shown in Table 1.

2.2. Discretization. Discretization is an important pre-processing method in data anal-ysis concept. By discretization methods, continuous-valued features are transformed into interval-valued features. Because the data is transformed into a more meaningful shape, the performance of the classification becomes more eﬀective. There are many discretiza-tion methods like entropy-based, equal frequency and equal width discretizadiscretiza-tion in liter-ature [13,14]. Common steps of discretization methods are shown in Figure 1 and these steps can be summarized as follows.

Firstly, values of the continuous-valued feature in the dataset are sorted. Then, the candidate cut points are determined for this continuous-valued feature. Fitness values of obtained candidate cut points are computed and values of the continuous-valued feature are splitted according to candidate cut point which has the best fitness value. These steps are used recursively until the stopping criterion. A discretization method is identified by determination of the candidate cut points, computation of the fitness values of candidate cut points and the stopping criterion.

2.3. Entropy-based discretization. Entropy-based discretization method is a com-monly used discretization method proposed by Fayyad and Irani [15]. In this method, candidate cut-points are determined for the continuous-valued feature. The cut-point is selected according to the entropy of the candidate cut-points. Entropies of candidate cut-points are defined by following expressions:

E(A, T;S) = |S1| |S|Ent(S1)− |S2| |S|Ent(S2) (1) Ent(S) = − Z ∑ i=1 p(Ci, S) log2(p(Ci, S)) (2)

where A is the feature which is going to be discretizated, T is candidate cut point, S is the set of samples, S1 and S2 are the subsets of the split samples for the left and right

part of S, respectively, Z is the number of the classes in the dataset, Ci is the decision

(3)

Table 1. Detailed analysis of the dataset

Features Max Min Median Mean SD

MDVP:Fo (Hz) 260.105 88.333 148,79 154.2286 41.39006 MDVP:Fhi (Hz) 592.030 102.145 175,829 197.1049 91.49155 MDVP:Flo (Hz) 239.170 65.476 104,315 116.3246 43.52141 MDVP:Jitter (%) 0.03316 0.00168 0,00494 0.00622 0.004848 MDVP:Jitter (Abs) 0.00260 0.000007 0,00003 0.000043 0.000034 MDVP:RAP 0.02144 0.00068 0,0025 0.003306 0.002968 MDVP:PPQ 0.01958 0.00092 0,00269 0.003446 0.002759 Jitter:DDP 0.06433 0.00204 0,00749 0.00992 0.008903 MDVP:Shimmer 0.11908 0.00954 0,02297 0.029709 0.018857 MDVP:Shimmer (dB) 1.302 0.085 0,221 0.282251 0.194877 Shimmer:APQ3 0.05647 0.00455 0,01279 0.015664 0.010153 Shimmer:APQ5 0.07940 0.00570 0,01347 0.017878 0.012024 MDVP:APQ 0.13778 0.00719 0,01826 0.024081 0.016947 Shimmer:DDA 0.16942 0.01364 0,03836 0.046993 0.030459 NHR 0.31482 0.00065 0,01166 0.024847 0.040418 HNR 33.047 8.441 22,085 21.88597 4.425764 RPDE 0.685151 0.256570 0,495954 0.498536 0.103942 DFA 0.825288 0.574282 0,722254 0.718099 0.055336 spread1 –2.434031 –7.964984 –5,72087 –5.6844 1.090208 Spread2 0.450493 0.006274 0,218885 0.22651 0.083406 D2 3.671155 1.423287 2,361532 2.381826 0.382799 PPE 0.527367 0.044539 0,194052 0.206552 0.090119

Feature, names of the features obtained from biomedical voice measurements; Max, maximum value of the features; Min, minimum value of the features; Median, median value of the features; Mean, mean value of the features; SD, Standard derivation of the features; NoC, number of the cut points obtained after discretization; CP, value of the cut points obtained after discretization.

After selection of the cut-point which has the minimum entropy, values of the continuous-valued feature are splitted into two parts. Then, this procedure is repeated until the stopping criterion is reached for each part. In entropy-based discretization method, the stopping criterion is defined by following expressions:

Gain(A, T;S)> log2(N −1)

N +

∆(A, T;S)

N (3)

Gain(A, T;S) = Ent(S)−E(A, T;S) (4) ∆(A, T;S) = log₂(3Z−2)−[Z.Ent(S)−Z1.Ent(S1)−Z2.Ent(S2)] (5)

where A is the feature which is going to be discretizated, T is candidate cut point, S is the set of samples,S1andS2 are the subsets of the split samples for the left and right part

of S, respectively,N is the number of the samples in S, Z is the number of the classes in the dataset, Z1 and Z2 are the numbers of the classes present in S1 and S2, respectively.

2.4. Na¨ıve Bayes classifier. Na¨ıve Bayes is a probabilistic classification method [16].

vN B of each diﬀerent class in training data is calculated for a new sample. The new sample

(4)

Figure 1. General steps of discretization method

[17]. vN B is defined by following expression:

vN B = arg max vj p(vj) ∏ ı p(ai|vj) (6)

where j is the number of the classes in the dataset, i is the number of the condition features in the dataset, ai is the value of ith feature, vj is the class value of jth class.

2.5. C4.5 decision tree classifier. Decision Tree classifier is a non-complex classifica-tion method. Decision trees are composed of nodes, branches and leaves. Nodes, branches and leaves are defined as the features, the values of features and the values of the decision features, respectively. Each diﬀerent path which begins from the root node and reaches to the leaf denotes a rule like “if condition1 and condition2 and . . . then decision”. Nodes and branches correspond to condition terms, and leaves correspond to decision term in the rule.

In this study, C4.5 method is used to create the decision tree. In this method, the feature which has maximum gain is determined as the root node. The gains belonging to

(5)

the subset of branches of the root node are recalculated. Nodes having maximum gain within each subset are determined as sub-nodes [18,19]. The creation of the tree goes on until each branch denotes a class. Gain is defined by following expression:

Gain(S, A) =Entropy(S)− ∑ v∈values(A) |Sv| |S|Entropy(Sv) (7) Ent(S) =− Z ∑ i=1 p(Ci, S) log2(p(Ci, S)) (8)

whereS is the set of samples,A is the feature which represents the calculated gain, Sv is

the set of samples in where A feature get v value, Z is the number of the classes in the dataset,p(Ci, S) is the proportion of samples/instances lying in the class Ci.

2.6. k-nearest neighbor classifier. NN is a supervised learning algorithm. The k-neighborhood parameter is determined in the initialization stage of k-NN. The k samples which are closest to the new sample are found among the training data. The class of the new sample is determined according to the closest k-samples by using majority voting [20]. Distance measurements like Euclidean, Hamming and Manhattan are used to calculate the distances of the samples to each other.

2.7. Support vector machine classifier. SVM, which is based on the statistical learn-ing theory, is one of the most commonly used classification techniques. This technique was firstly proposed by Vapnik [21]. In basic concept of linear SVM, the method separates two classes from each other optimally. It is aimed to find the optimal separating hyper-plane that makes the margin between the hyperhyper-planes maximum so that the classes are optimally separated. As a learning method, SVM is often used to train and design radial basis function (RBF) networks, and generally, it is more successful compared to similar artificial neural networks. The formulations and the detailed concept of this commonly used classifier can be reached from studies given [22-28].

3. Experimental Results. Implementing diﬀerent classification methods, the research-ers analyzed the eﬀect of discretization on the diagnosis of Parkinson’s disease. The dataset used in the study is available online in the UCI database containing Parkinson dataset. Entropy-based discretization is used as the discretization method. The reason for selecting entropy-based discretization as the discretization method is it is being an unsupervised discretization method. The dataset and discretizated form of the dataset are classified with Naive Bayes, C4.5, k-NN and SVM classification methods. Both of the obtained classification results are compared.

To make the results more consistent, k-fold cross validation is used. Each classification is implemented by a 5-fold cross validation in this study. The dataset is classified in both discretizated and non-discretizated forms using RBF, linear and polynomial kernels with SVM classifier. The SVM classifier’s kernel parameter range for c and σ can be given as [0.1, 30000] and [0.001, 10], respectively. RBF kernel is determined as the optimum kernel used in SVM. The parameters of the optimum RBF kernel are 0.125 and 2 for G and c, respectively. k parameter is taken as 5 in k-NN. Euclidian distance is used as the distance measurement between samples in k-NN and is given as follows:

D(x, y) = v u u t∑n i=1 (xi−yi)2 (9)

(6)

where n is the number of the features in the dataset, x and y are the samples in the dataset.

Classification accuracy, sensitivity, specificity and area under the ROC curve (AUC) measurements are utilized to compare the results. The measurements are as follows:

CA= T P +T N T P +T N +F P +F N (10) SEN = T P T P +F N (11) SP E = T N T N +F P (12)

AUC = Area Under the ROC curve (13)

where, CA, SEN and SP E denoted classification accuracy, sensitivity and specificity, respectively. T P is number of healthy prediction in healthy samples. T N is number of patient prediction in patient samples. F P is number of patient prediction in healthy samples. F N is number of healthy prediction in patient samples.

Twenty two continuous-valued features in Parkinson’s disease dataset are discretizated by using entropy-based discretization method. Numbers and values of the cut-points of features are given in Table 2.

Table 2. Cut-points of features

No Features NoC CP No Features NoC CP

1 MDVP:Fo (Hz) 3 129.336 12 Shimmer:APQ5 1 0.01321 193.030 223.361 2 MDVP:Fhi (Hz) 2 233.481 13 MDVP:APQ 1 0.01949 262.707 3 MDVP:Flo (Hz) 1 189.621 14 Shimmer:DDA 1 0.04363 4 MDVP:Jitter (%) 1 0.00356 15 NHR 1 0.00484 5 MDVP:Jitter (Abs) 2 0.000010 16 HNR 1 23.949 0.000040 6 MDVP:RAP 1 0.00176 17 RPDE 1 0.469928 7 MDVP:PPQ 1 0.00203 18 DFA 1 0.683761 8 Jitter:DDP 1 0.00528 19 spread1 2 –6.650471 –5.592584 9 MDVP:Shimmer 1 0.02751 20 spread2 1 0.178540 10 MDVP:Shimmer (dB) 1 0.263 21 D2 1 2.330716 11 Shimmer:APQ3 1 0.01454 22 PPE 3 0.103561 0.133867 0.215724

Feature, names of the features obtained from biomedical voice measurements; NoC, the number of the cut points obtained after discretization; CP, the value of the cut-points obtained after discretization.

The classification accuracy, sensitivity, specificity and AUC which are obtained from both discretizated and non-discretizated forms of the classification processes using naive Bayes, C4.5, k-NN and SVM classifiers are given in Table 3. By using entropy-based discretization method, the classification accuracies and AUC values of Naive Bayes, C4.5, k-NN, SVM classifiers have increased to 8.2%, 4.1%, 9.2%, 12.8% and 0.94%, 7.24%, 8.42%, 8.82%, respectively.

(7)

Table 3. Classification results

CA (%) Sen Spe AUC Na¨ıve Bayes non-discretizated 77.44 0.6458 0.9116 0.8965 discretizated 85.64 0.8750 0.8503 0.9059 C4.5 non-discretizated 84.62 0.8542 0.7483 0.7869 discretizated 88.72 0.7500 0.9320 0.8593 k-NN non-discretizated 84.62 0.3333 0.9796 0.8735 discretizated 93.85 0.8542 0.9660 0.9577 SVM non-discretizated 82.05 0.7083 0.8912 0.8732 discretizated 94.87 0.8333 0.9864 0.9614

CA, SEN, SPE and AUC are denoted classification accuracy, sensitivity, speci-ficity and Area under ROC curve, respectively.

ROC curves belonging to healthy and unhealthy samples obtained using Naive Bayes, C4.5, k-NN and SVM are as shown in Figures 2-5.

As shown by ROC curves, after discretization of dataset, an increase in classification accuracy has been observed in this study. Besides, the obtained results have shown that discretization method has given a very promising result in the diagnosis of Parkinson disease. The best model on the diagnosing of Parkinson disease was SVM with dis-cretizated dataset. As a result, discretization method can be used in medical dataset as pre-processing. Thanks to discretization, diagnosis of diseases can be performed more accurately.

4. Conclusion. In this study, the dataset of Parkinson’s disease obtained from UCI machine learning repository is used. Na¨ıve Bayes, C4.5, k-NN and SVM classifier methods are used to classify the dataset. The dataset is classified using the features discretizated and non-discretizated in order to show the eﬀectiveness of discretization on diagnosis of Parkinson’s disease. The results have shown that discretization increases the classification accuracy of the diagnosis of Parkinson’s disease.

(a) (b)

Figure 2. (a) ROC curve belongs to the healthy class obtained by

clas-sification of both discretizated and non-discretizated dataset using Na¨ıve Bayes and (b) ROC curve belongs to the unhealthy class obtained by clas-sification of both discretizated and non-discretizated datasets using Na¨ıve Bayes

(8)

(a) (b)

classi-fication of both discretizated and non-discretizated dataset using C4.5 and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using C4.5

(a) (b)

classifi-cation of both discretizated and non-discretizated datasets using k-NN and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using k-NN

(a) (b)

classifi-cation of both discretizated and non-discretizated datasets using SVM and (b) ROC curve belongs to the unhealthy class obtained by classification of both discretizated and non-discretizated datasets using SVM

(9)

REFERENCES

[1] A. E. Lang and A. M. Lozano, Parkinson’s disease – First of two parts, The New England Journal of Medicine, vol.339, pp.1044-1053, 1998.

[2] N. Singh, V. Pillay and Y. E. Choonara, Advances in the treatment of Parkinson’s disease, Progr. Neurobiol, vol.81, pp.29-44, 2007.

[3] National Collaborating Centre for Chronic Conditions, Parkinson’s disease: National clinical guide-line for diagnosis and management in primary and secondary care, Royal College of Physicians, 2006.

[4] A. K. Ho, R. Iansek, C. Marigliani, J. L. Bradshaw and S. Gates, Speech impairment in a large sample of patients with Parkinson’s disease,Behavioural Neurology, vol.11, pp.131-137, 1998. [5] J. A. Logemann, H. B. Fisher, B. Boshes and E. R. Blonsky, Frequency and co-occurrence of

vocal-tract dysfunctions in speech of a large sample of parkinson patients,Journal of Speech and Hearing Disorders, vol.43, pp.47-57, 1978.

[6] M. A. Little, P. E. McSharry, S. J. Roberts, D. A. Costello and I. M. Moroz, Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection,Biomedical Engineering Online, vol.6, pp.23-58, 2007.

[7] M. A. Little, E. McSharry, E. J. Hunter, J. Spielman and L. O. Ramig, Suitability of dysphonia mea-surements for telemonitoring of Parkinson’s disease,IEEE Transactions on Biomedical Engineering, vol.56, pp.1015-1022, 2009.

[8] A. Kumar and D. Zhang, Hand-geometry recognition using entropy-based discretization, IEEE Transactions on Information Forensics and Security, vol.2, pp.181-187, 2007.

[9] K. Polat, S. Kara, A. G¨uven and S. G¨une¸s, Utilization of discretization method on the diagnosis of optic nerve disease,Computer Methods and Programs in Biomedicine, vol.91, pp.255-264, 2008. [10] R. Abraham, J. Simha and S. Iyengar, A comparative analysis of discretization methods for

med-ical datamining with Na¨ıve Bayesian clasifier, The 9th International Conference on Information Technology, pp.235-236, 2006.

[11] J. Demsar, B. Zupan, N. Aoki, M. J. Wall, T. H. Granchi and J. R. Beck, Feature mining and predictive model construction from severe trauma patient’s data,International Journal of Medical Informatics, vol.63, pp.41-50, 2001.

[12] S. Acid, L. M. Campos, J. M. Fernandez-Luna, S. Rodriguez, J. M. Rodriguez and J. L. Salcedo, A comparison of learning algorithms for Bayesian networks: A case study based on data from an emergency medical service,Artificial Intelligence in Medicine, vol.30, pp.215-232, 2004.

[13] M. K. Ismail and V. Ciesielski, An empirical investigation of the impact of discretization on common data distributions,Design and Application of Hybrid Intelligent Systems, pp.692-701, 2003.

[14] H. Kodaz, S. ¨Oz¸sen, A. Arslan and S. G¨une¸s, Medical application of information gain based artificial immune recognition system (AIRS): Diagnosis of thyroid disease,Expert Systems with Applications, vol.36, no.2, pp.3086-3092, 2009.

[15] U. M. Fayyad and K. B. Irani, Multi-interval discretization of continuous-valued attributes for clas-sification learning,The 13th International Joint Conference on Artificial Intelligence, pp.1022-1027, 1993.

[16] H. Kima and S. Chen, Associative Na¨ıve Bayes classifier: Automated linking of gene ontology to medline documents,Pattern Recognition, vol.42, pp.1777-1785, 2009.

[17] C. Hsu, H. Huang and T. Wong, On why discretization works for Na¨ıve Bayesian,Lecture Notes in Computer Science, pp.440-452, 2000.

[18] M. Hill and M. T. Mitchell,Machine Learning, Singapore, 1997.

[19] J. R. Quinlan, Induction of C4.5 decision trees,Machine Learning, vol.1, pp.81-106, 1986.

[20] G. Shakhnarovish, T. Darrell and P. Indyk,Nearest-Neighbor Methods in Learning and Vision, MIT Press, 2005.

[21] V. Vapnik,The Nature of Statistical Learning Theory, Springer, New York, 1995.

[22] K. Y. Chen and C. H. Wang, A hybrid SARIMA and support vector machines in forecasting the production values of the machinery industry in Taiwan, Expert Systems with Applications, vol.32, pp.254-264, 2007.

[23] E. C¸ omak, A. Arslan and ˙I. T¨urko˘qlu, A decision support system based on support vector machines for diagnosis of the heart valve diseases,Computers in Biology and Medicine, vol.37, pp.21-27, 2007. [24] K. Takeuchi and N. Collier, Bio-medical entity extraction using support vector machines,Artificial

(10)

[25] J. Chen and F. Pan, A new online support vector machine algorithm, ICIC Express Letters, vol.4, no.1, pp.149-154, 2010.

[26] Z. Chen, W. Hong and C. Wang, RNA secondary structure prediction with plane pseudoknots based on support vector machine,ICIC Express Letters, vol.3, no.4(B), pp.1411-1416, 2009.

[27] B. R. Chang and H. F. Tsai, Training support vector regression by quantum-neuron-based hopfield neural net with nested local adiabatic evolution, International Journal of Innovative Computing, Information and Control, vol.5, no.4, pp.1013-1026, 2009.

[28] N. Begum, M. A. Fattah and F. Ren, Automatic text summarization using support vector machine,

International Journal of Innovative Computing, Information and Control, vol.5, no.7, pp.1987-1996, 2009.