International Journal of Advanced Engineering Science and Technological Research (IJAESTR) ISSN: 2321-1202, www.aestjournal.org @2015 All rights reserved
277
Application of Data Mining Methods and Techniques in Clinical Diabetics
Disha Singh
School of Computer Science Engineering, Galgotias University, India [email protected]
Abstract - Data mining has been used intensively and extensively by many organizations. Data mining in clinical diabetics is play the vital role for the diabetic patient, how to define patient blood glucose, sex, age, and diabetic type on the basis of their condition in diabetic research. Application of data mining technique in selected articles were useful for extracting valuable knowledge and generating new hypothesis for further scientific research / experimentation and improving health care for diabetes patients. The results could be used in scientific research and real-life practice to improve the quality of health care diabetes patients.
After searched MEDLINE database through PubMed.
Initially identified 31 articles by the search and 17 articles selected for used in diabetics research on the basis of data mining method and its main goal to identify Diabetics type data sets ,data mining methods ,data mining software and technologies and conclusion.
Diabetic research at last improve the quality of clinical, healthcare care of diabetic patient .
Keywords- Data mining; Data mining Application; Clinical Diabetics; Healthcare, KDD.
I. INTRODUCTION
Data mining is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, reduce costs. It have a different tools and Application those are require for extracting or mining of meaning full data.
Knowledge Discovery and Database is the process of finding complex patterns and define relationships in data. The tools and techniques of
Data mining have conclude impressive results in other industries, and healthcare clinic needs to take benefit of advances in this exciting field.
Diabetics is the modern society disease because lack of exercise improper routine, diet etc.
A diabetic is increasing in adult population. its effect directly on the person health who suffer from diabetes. A diabetic is a very serious disease that, if not diagnosis or treated properly and on time, can lead to very serious complication including death.
Organizations that take advantage of KDD techniques will find that they offer valuable assistance in the quest to lower healthcare costs while improving healthcare quality in all the department of clinic.
Diabetics is a diseases in which body can’t regulate the amount of sugar in the blood[1]. It is a group of metabolic diseases in which a person has high blood sugar, either because the body part pancreas does not produce enough insulin, or because cells do not respond to the insulin that is produced.
The huge amounts of data generated by healthcare transactions are too complex and voluminous to be processed and analyzed by traditional methods. Data mining provides the technology, methodology to transform these mounds of data into useful information for decision making. Mine Safety and Health Administration (MSHA) is empowered by statute to collect detailed information on accidents, injuries and illnesses that occur in the mining industry. Between 1990 and 1999, mining operators and contractors reported 260, 510 accidents, injuries and illnesses from all causes, including 959 fatalities Healthcare professionals are overwhelmed with a huge amount of information generated from different sources. In
International Journal of Advanced Engineering Science and Technological Research (IJAESTR) ISSN: 2321-1202, www.aestjournal.org @2015 All rights reserved
278
this context, preventable medical errors are estimated to be the cause of thousands of deaths and loss of billions of dollars per year.II. RELETED WORKS
Research work predicted how likely the people with different age groups are affected by diabetes based on their life style activities. They also found out factors responsible for the individual to be diabetic. Many Research paper introduce about diabetics on the bases on different tools and technologies for improve the quality of the healthcare diabetics .like Feature Selection And Classification Modal Construction On Type 2 Diabetics “Patient Data” in 2000-2004. which is used “classification and knowledge discovery” and FSSMC [8](feature selection via supervised model construction ). Different classification technique like Naive Bayesian , IBI, C4.5 they predict diabetic condition and the most important for the diabetics patient is their “Diet Treatment ”.For technology some factor and treatment methods used . factor like age , sex , diagnosis ,duration ,and family history. And treatment methods insulin diet ,tablets .Data mining a diabetics data warehouse
“Are Rough sets a useful addition?”[5] It had used Neural Networks and the ADAP Algorithm 76%, C4.5 71.1%,multistream Dependency Detection (MSDD) 71.33 %.they used ROSETTA GUI using missing value ,divided into a limited number of value classification group. Rosetta software useful for accuracy calculation 82.6%,machine learning 66-81% and its easily and quickly used for any Environment. Statistics given by the Centres’ for Disease Control states that 26.9% of the population affected by diabetes are people whose age is greater than 65, 11.8% of all men aged 20 years or older are affected by diabetes and 10.8% of all women aged 20 years or older are affected by diabetes.
After analysis and modelling they found age_new as a nominal variable ,dividing in to three groups young age, middle age and old age they found 34%
of the population whose age was below 20 years was not affected by diabetes. 33.9% of the population whose age was above 20 and below 45 years was not affected by diabetes. 26.8% of the population whose age was above 45 years was not diabetic.
Diabetics is totally depended on the level of blood glucose. And the biological region is that pancreas is not properly working. for controlling Diabetics, care or treatment is so important. We encourage people with diabetes and their families to learn as much as possible about the latest medical therapies and approaches ,as well as healthy lifestyle.
III. PROPOSED SYSTEM
We have applied application of data mining ,different method and techniques for improving the quality of the healthcare in clinical diabetics. Used clinical data to predict and evaluate a patient being affected with diabetes or not. The training dataset used for data mining classification was the Pima Indians Diabetes Database of National Institute of Diabetes and Digestive and Kidney Diseases from UCI Machine Learning Repository [6]. The dataset contains 768 record samples, each having 8 attributes. We applied different classification technique for evaluating good result and avoid different types of error.
IV. PROPOSED SYSTEM DESIGN
The Training dataset used for data mining classification ,cluttering methods to calculate and evaluate the quality of disease in healthcare like in diabetics. The diabetic data warehouse is from a large integrated health care system in the NCR India with diabetic patients. the data set contains 1000 records samples ,each having attribute like age,sex, emergency department visits, co morbidity index, dyslipidemia, hypertension, retinopathy. we used this dataset for our classification as the data is complete the diagrammatic representation of the proposed system design is given in
“Fig. 1,”
.
International Journal of Advanced Engineering Science and Technological Research (IJAESTR) ISSN: 2321-1202, www.aestjournal.org @2015 All rights reserved
279
“Fig. 1,”
Feature selection [8] is the technique that is applied on model construction on type 2 diabetics patients data. they used supervised model construction and classification technique like Naive bayes ,IBI,C4.5. in all this technique predict condition controlled. FSSMC used age, diagnosis duration ,insulin treatment for random blood glucose and conclude “Diet Treatment”. Factor and Treatment methods those are given in the TABLE I.
TABLE I.
There are many features used different type of techniques As the data set consist of different nominal data , and the filtering technique is effective for the accurate result. And after discussion with doctor and got a filtered data .which is show in the TABLE II.
TABLE II.
IV. EVALUATION
Every algorithm predicts different output on the based on their data sets and attributes, feature .And finally easily evaluate accuracy ,sensitivity On the based on applied methods. Every research used different methods and technique for a good and efficient result. Classification algorithm predicts the class level. And finally output will be find out whether the person is affected with diabetics or not.
The accuracy [9] of a classifier on a given test set is the percentage of test set tuples that are correctly classified by the classifier. confusion matrix is a useful tool for analysing classifier accuracy.
confusion matrix is given below in TABLE III.
C1 C2
C1 True Positive False Negative
C2 False Positive True Negative
TABLE III.
TRAINING DATA
FEATURE RELAVANCE ANALYSIS
COMPARISON OF CLASSIFICATION ALGORITHMS
SELECT CLASSIFIER
IMPROVISED CLASSIFICATION ALGORITHM TESTING
DATA EVALUATION
S.No Attribute Type
1. Number of times Continuous
2. Plasma glucose
concentration a 2 hours in an oral glucose tolerance test
Continuous
3. Diastolic blood
Pressure(mm Hg)
Continuous
4. 2-hours serum
insulin(mm U/ml)
Continuous
5. Body mass
index(kg/m)
Continuous
6. Diabetes Pedigree
function
Continuous
7. Age Continuous
Factors Treatment Methods
Age Insulin
Diagnosis Diet
Family History Tablets
International Journal of Advanced Engineering Science and Technological Research (IJAESTR) ISSN: 2321-1202, www.aestjournal.org @2015 All rights reserved
280
1.True positive (TP)eqv. with hit
2.True negative (TN) eqv. with correct rejection 3.False positive (FP)
eqv. with false alarm, Type I error 4.False negative (FN)
eqv. with miss, Type II error
5.Sensitivity or true positive rate (TPR) eqv. with hit rate, recall
TPR = TP / P = TP / (TP+FN) 6. Specificity (SPC) or True Negative Rate
SPC = TN / N = TN / (FP + TN) 7. Precision or positive predictive value (PPV)
PPV =TP / (TP+ FP) 8.Negative predictive value (NPV)
NPV= TN / (TN + FN) 9.Fall-out or false positive rate (FPR)
FPR = FP / N = FP / (FP+ TN) 10. .False discovery rate (FDR)
FDR = FP / (FP+TP) =1-PPV
11. Miss Rate or False Negative Rate (FNR) FNR=FN / P= FN / (FN + TP)
12.Accuracy (ACC)
ACC= (TP + TN) / (P + N)
V. CONCLUSION AND FUTURE WORK We have applied various data mining methods and technique for evaluating the result for improving
the quality of the healthcare in the clinical diabetics and finally Diabetes data set based on confusion metric ,feature selection C4.5 classification algorithm Bayesian smoothing ,artificial intelligence, prediction and classification ,support vector machine for feature selection they all gave the different result based on their attribute and data set. for diabetics blood glucose and age is the most important factor for finding the person diabetics or not. With the help of this factor always get the accurate result .and Naive Bayesian give the prior probability of each class and all computed based on training tuples. Used “Laplace estimator” for probability estimation and avoid zero probability.
And the blood glucose or age gives the maximum probability for the class level. for using this algorithm provide greater accuracy in classification. A classification rate of 91% was obtained for C4.5 algorithm. For future enhancement blood glucose used for diabetics for controlling kidney failure and heart attacks.
VI REFERENCES [1] http://www.emedicinehealth.com/diabetes.
[2] http://en.wikipedia.org/wiki/Diabetes_mellitus.
[3] http://diabetes.co.in.
[4] Han, J., Kamber, M.: Data Mining; Concepts and Techniques, Morgan Kaufmann .
[5] Joseph L. Breault., “Data Mining Diabetic Databases: Are Rough Sets a Useful Addition? “ .
[6] UCI Machine Learning Repository- Centre for Machine Learning and Intelligent System, http://archive.ics.uci.edu Knowledge Discovery in Databases, http://www2.cs.uregina.ca.
[7] Y Huang, P McCullagh, N Black, R Harper- Feature selection and classification model construction on type 2 diabetic patients' data , Artificial intelligence in …, 2007 - Elsevier
[9] ISO 9001:2008 September 2012 224 Application of Data Mining Methods and Techniques for Diabetes Diagnosis K. Rajesh, V. Sangeetha
[10] C4.5 Algorithm Description, http://en.wikipedia.org /wiki/C4.5_algorithm.
[11] A Survey of Data Mining Techniques on Medical Data for Finding Locally Frequent Diseases Mohammed Abdul Khaleel
[12]Data mining technology for blood glucose And diabetics management
International Journal of Advanced Engineering Science and Technological Research (IJAESTR) ISSN: 2321-1202, www.aestjournal.org @2015 All rights reserved
281
[13] prediction of blood glucose level of type one diabetics using response surface methodology.
[14] http://en.wikipedia.org/wiki/Confusion_matrix
[15] Pardha Repalli, “Prediction on Diabetes Using Data mining Approach”.
[16] G. Parthiban, A. Rajesh, S.K.Srivatsa, “Diagnosis of Heart Disease for Diabetic Patients using Naive Bayesian Method “, International Journal of Computer Applications (0975 – 8887) Volume 24– No.3, June 2011.