Classification of Data for Heart Disease Prediction System
Using MLP
Dimple1
Abstract
Heart Disease Prediction System is the system that helps to predict the heart disease mainly cardiovascular disease that includes Myocardial infractions. The importance of heart disease prediction system can be visualized from the fact that heart disease is one of the diseases that causes highest mortality rate. The present system helps in diagnosis of heart disease effectively and reduces the overall cost. Various techniques like Artificial Neural Network, Naïve Bayes etc can be used to implement this. The analysis shows that out of these three classification models Neural Networks predict heart disease with highest accuracy. The main objective of this project is to develop an Intelligent Myocardial Infarction Prediction System using the data mining modeling technique, namely, Multilayer perceptron network that is modified to reduce the time complexity of the existing algorithm. By providing effective treatments, it also helps to services at affordable costs.
Keywords
Prediction system, Artificial Neural Network, multilayer perceptron network, k- means clustering
1
CSE Dept., U.I.E.T., Maharshi Dayanand University Rohtak
SHIV SHAKTI
Introduction
For decision making and prediction it is felt that the effective recognition of information from a huge collection of data [1]. This is an interactive and iterative process consisting of numerous subtasks and decisions and is called as Knowledge Discovery from Data. The process of Knowledge Discovery is to convert the raw data into an organized form so that it can be used in decision making in man applications. Data Mining can be defined as “a variety of techniques to identify suggest of information or decision making knowledge in the database and extracting these in a way that they can put to use in areas such as predictions of an event, forecasting.” Many areas utilize Data mining as marketing, customer relationship Management , engineering, healthcare industry, expert prediction, data mining and mobile computing etc. [2] .Data mining can also be used in field of medicines in helping the diagnosis of various diseases like thyroid, cancer, heart disease. The healthcare industry data, consists of a number of tests essential to diagnose a particular disease, is heterogeneous and voluminous in nature .Medical data mining can be used for exploring the hidden pattern in this raw data set. Working on heart disease patients’ databases is one type of a real-life application [2].
Fig 1: Block diagram of prediction system
1.1 Application of Data Mining in the Health Sector
Medical history data, comprises of a number of tests essential to diagnose a particular disease, is heterogeneous and voluminous in nature. Medical data mining can be used for exploring the hidden pattern in this raw data set. Some diagnostic and laboratory procedures are very costly and painful to patients.
1.2 Heart disease
The term Heart disease comprises the diverse diseases that affect the heart.Heart disease is the major cause of casualties in the developed and developing countries. Heart disease kills one person every 34 seconds in the United States. [3] Coronary heart disease, Cardiomyopathy and Cardiovascular disease are some categories of heart diseases.
3. Data mining technique used for classification of data
3.1 Neural Networks
An artificial neural network (ANN), often just called a "neural network" (NN), is a algorithmic based model or mathematical and computational model based on biological neural network. An (ANN) artificial neural network, also called a neural network, is a mathematical model based on biological neural networks [9]. A neural network consists of an interconnected group of artificial neurons. Neural networks are used to model complex relationships
RECENT DATA PREDICTION
SYSTEM
HISTORICAL DATA
between inputs and outputs or to find patterns in data.
Fig2: Framework of Neural network containing three layers [9]
It maps a set of input data onto a set of appropriate output data [4] .It consists of 3 layers input layer, hidden layer & output layer. There is connection between each layer & weights are assigned to each connection. The primary function of neurons of input layer is to divide input xi into neurons in hidden layer. Neuron of hidden layer adds input signal xi with weights wji of respective connections from input layer. The output Yj is function of
Yj = f (Σ wji xi)
3.1.1 Multilayer perceptron network
A multilayer perceptron (MLP) is a feed forward artificial neural network model that maps sets of input data onto a set of appropriate outputs. It (MLP) consists of multiple layers of nodes in a directed graph, and each layer is fully connected to the next one.Each node is a neuron with a nonlinear activation function except for the input nodes. MLP utilizes a supervised learning technique called back propagation for training the network.MLP is a modified form of the standard linear perceptron and
can distinguish data that are not linearly separable.
Fig 3: Framework of multilayer perceptron network with hidden layer containing three layers
If a multilayer perceptron (MLP) has a simple on-off mechanism i.e. linear activation function in all neurons ,to determine whether or not a neuron fires, then it is easily proved with linear algebra that any number of layers can be reduced to the standard two-layer input-output model. The main activation functions used in current applications is sigmoids, and given by
3.1.2 Layers in MLP
The multilayer perceptron consists of three or more layers with each layer has nonlinearly-activating nodes(an input and an output layer with one or more hidden layers) .In one layer each node connects with a certain weight w{ij} to every node in the following layer. Here yi is the output of the ith node (neuron) and vi is the weighted sum of the input synapses.
amount of error in the output compared to the expected result after each piece of data is processed. This is an example of supervised learning, and is done out with back propagation.
4. K-Nearest Neighbor
It is an unsupervised clustering algorithm. Here “K” stands for number of clusters, it is usually a user input to the algorithm. K-means algorithm is iterative in nature. It converges however only a local minimum is obtained. It Works only for numerical data. It is Easy to implement.
The main idea is to define k centroids, one centroid for each cluster. In the next step take each point belonging to a given data set and associate it to the nearest centroid. The first step is completed , when no point is pending,and an early groupage is done at this point and after that we need to re-calculate k new centroids. At this point we have these k new centroids,and between the same data set points and the nearest new centroid, a new binding has to be done . For clustering, a loop has been created and as a result of this loop it can be noticed that the k centroids change their location step by step until no more changes are done.
Proposed Algorithm
MLP training algorithm with clustering 1. Initialize dataset.
2. Divide the dataset into number of clusters by using the K-mean clustering (k=2).
3. Apply step 3 to 5 on each cluster
4. Initialize the weight vector ,learning rate η, and the epochs counter(k) with random values .
5. Suppose wk is the network’s weight vector of kth epoch
Start of epoch k. Store the current values of the weight vector Wold=Wk
For n=1,2,…N
Select the training and apply the error backpropagation in order to compute the partial derivatives
Update the weights
〖Wi(k+1)=Wi(k)-η〗_^ ӘE/ӘWi K=k+1
End of epoch k. Termination check. If it is true, then terminate.
Else go to step 5. 6. end
5. Data Source
Key attribute
1. PatientID – Patient’s identification number
Input attributes
1. Sex (value 1: Male; value 0 : Female) 2.Chest Pain Type: value 1: typical type 1 angina pain, value 2: typical type angina pain, value 3: non angina pain; value 4: asymptomatic
4. Restecg – resting electrographic results (value 0: normal; value 1: 1 having ST‐T wave abnormality; value 2: showing probable or definite left ventricular hypertrophy)
5. Exang – exercise induced angina (value 1: yes; value 0: no)
6. Slope – the slope of the peak exercise ST segment (value 1: unsloping; value 2: flat; value 3: downsloping)
7. CA – number of major vessels colored by floursopy (value 0 – 3)
8. Thal (value 3: normal; value 6: fixed defect; value 7: reversible defect)
9. Trest Blood Pressure (mm Hg on admission to the hospital)
10. Serum Cholesterol (mg/dl)
11. Thalach – maximum heart rate achieved 12. Oldpeak – ST depression induced by exercise relative to rest
13. Age in Year
14.smoke(value 1=smoker, value 0=non smoker )
5.1 Analyzing the Data Set
A data set (or dataset) is a collection of data, that can be presented in various forms but usually presented in tabular form. In relational model each column represents a particular variable and each row represent a given member of the data set in question. A total of 20 records with 15 medical attributes (factors) were obtained from the Heart
Disease database (UCI Repository) and real data from Pt.B.D.Sharma Medical College , Rohtak lists the attributes. The attribute “PatientID” was used as the key; the rest are input attributes.
6. Results and conclusion
MLP algorithm takes the data globally and then classifies the data. We have used clustering in the existing algorithm. Using K-Means clustering algorithm( k=2) we have made 2 clusters and then applied MLP algo to these clusters. This results in less time complexity.
6.1 Comparison of MLP and Modified MLP
Fig.4 classification of data with MLP
6.2 Result of Comparison of existing algorithm (mlp) and modified algorithm(mlp with clustering) using WEKA
7. REFERENCE
[1] Ahmad Esmaili Torshabi1, Marco Riboldi2, Andera Pella2, Ali Negarestani1, Mohammad Rahnema1 , “A Clinical Application of Fuzzy Logic” European
Community's Seventh Framework
Programme ([FP7/2007-2013] under grant agreement n° 215840-2)
[2] Hsinchun Chen, Sherrilynne S. Fuller, Carol Friedman, and William Hersh, "Knowledge Management, Data Mining, and Text Mining In Medical Informatics", Chapter 1, eds. Medical Informatics: Knowledge Management And Data Mining In Biomedicine, New York, Springer, pp. 3-34, 2005.
[3] Shadab Adam Pattekari and Asma Parveen, “PREDICTION SYSTEM FOR
HEART DISEASE USING NAIVE
BAYES” International Journal of Advanced Computer and Mathematical Sciences ISSN 2230-962. Vol 3, Issue 3, 2012, pp 290-294 [4] Chaitrali S. Dangare Sulabha S. Apte “Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques” International Journal of Computer Applications (0975 – 888) Volume 47– No.10, June 2012
[5] Sellappan Palaniappan, Rafiah Awang, "Intelligent Heart Disease Prediction System Using Data Mining Techniques", IJCSNS International Journal of Computer Science and Network Security, Vol.8 No.8, August
2008
[6] Mrs.G.Subbalakshmi, “Decision Support in Heart Disease Prediction System using Naive Bayes” ISSN : 0976-5166 Vol. 2 No. 2 Apr-May 2011
[7] E.P.Ephzibah1, Dr. V. Sundarapandian “Framing Fuzzy Rules using Support Sets for Effective Heart Disease Diagnosis” International Journal of Fuzzy Logic Systems (IJFLS) Vol.2, No.1, February 2012 DOI: 10.5121/ijfls.2012.2102
[8]ShantakumarB.Patil,Y.S.Kumaraswamy “Intelligent and Effective Heart Attack Prediction System Using Data Mining and Artificial Neural Network”,European Journal of Scientific Research ISSN 1450-216X Vol.31 No.4 (2009), pp.642-656 © EuroJournals Publishing, Inc. 2009
[9]Nidhi Bhatla, Kiran Jyoti,” An Analysis of Heart Disease Prediction using Different Data Mining Techniques” International Journal of Engineering Research &
Time taken to build model
0.16 seconds 0.02 seconds Correlation
coefficient
0.6068 0.1484 Mean absolute
error
1.1918 1.378 Root mean
squared error
1.3161 1.5298 Relative absolute
error
Technology (IJERT) ISSN: 2278-0181 Vol. 1 Issue 8, October – 2012
[10]Chaitrali S. Dangare Sulabha S. Apte, PhD.,” Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques”, International Journal of Computer Applications (0975 – 888)Volume 47– No.10, June 2012
[11]NitiGuru, Anil Dahiya, Navin Rajpal, "Decision Support System for Heart Disease Diagnosis Using Neural Network", Delhi Business Review, Vol. 8, No. 1 (January - June 2007).
[12]Carlos Ordonez, "Improving Heart Disease Prediction Using Constrained Association Rules," Seminar Presentation at University of Tokyo, 2004.
[13]. Han, J., Kamber, M.: “Data Mining Concepts and Techniques”, Morgan Kaufmann Publishers, 2006.
[14]www.google.com/en.wikipedia.org/wiki /Decision_tree
[15]www.google.com/en.wikipedia.org/wiki /K-nearest_neighbor_algorithm