Analysis of Data Mining Algorithm in Health Care Domain
Paul Davidson G
Assistant Professor, Department of Computer Applications, Bishop Heber College, Trichy [email protected]
Abstract: Health care industry produces huge amount of data that grip complex data identifying among patients and their medical conditions. Data mining is picking up prevalence in various research territories because of its boundless applications and strategies to mine the data in equalize way.
Data mining methods have the abilities to find hidden pattern or connections among the items in the medical data. In a decade ago, there has been increment in use of information mining strategies on restorative information for deciding valuable patterns or examples that are utilized as a part of examination and decision making. Data mining has a vast potential use in health care information that proficiently and solidly to detect diverse sort of disease. This paper highlights different features of Data Mining methods like classification, association and furthermore related work to dissect and foresee human disease.
Keywords: Data Mining, Health Care, Classification, Application, KDD.
I.
I
NTRODUCTIONData mining is an assortment of algorithmic techniques to extract instructive patterns from raw data. Healthcare industry today produces huge amounts of multifarious data about hospitals, resources, disease diagnosis, electronic patient records, etc. The large amount of data is crucial to be processed and scrutinized for knowledge extraction that empowers support for understanding the prevailing circumstances in healthcare industry. Data mining processes include framing a hypothesis, gathering data, performing pre-processing, estimating the model, and understanding the model and draw the conclusions.
Before studying how data mining algorithms are being applied on medical data, let us understand what types of algorithms exists in data mining and how they are functioning.
1.1. KDD Process:
Fig. 1. Different Phases of KDD Process
The motivation for handling data and performing computation is the finding of information. In this paper, we store data about a certain process and analysed later that information in order to useful it in a meaningful manner. The KDD process employs data mining methods to find out patterns at some measure of interestingness.
The KDD is the method of turning the low- level data into high-level data.
Selection: This technique is used to collect the heterogeneous data from varied sources for processing.
Real life medical data may be incomplete, complex, noisy, inconsistent, and/or irrelevant which requires a selection process that gathers the important data from which knowledge is to be extracted.
Pre-processing: This step performs basic operations of eliminating the noisy data, try to find the missing data or to develop a strategy for handling missing data, detect or remove outliers and resolve inconsistencies among the data.
Transformation: This method transforms the data into forms which is suitable for mining by performing task like aggregation, smoothing, normalization, generalization, and discretization. Data reduction task shrinks the data and represents the same data in less volume, but produces the similar analytical outcomes.
Data Mining: It includes choosing the data mining algorithm(s) and using the algorithms to generate previously unknown and hypothetically beneficial information from the data stored in the database. This comprises deciding which models/algorithms and parameters may be suitable and matching a specific data mining method with the general standards of the Data mining process.
Interpretation and Evaluation: This technique includes presentation of mined patterns in understandable form.
Various types of information need different type of representation; in this step the mined patterns are interpreted. Evaluation of the outcomes is prepared with statistical justification and significance testing.
1.2. Data Mining Applications in Healthcare Sector:
Healthcare sector nowadays creates a large amount of complex data about patients, hospital resources, disease diagnosis, electronic patient records and various types of
medical devices. Larger amounts of data are a key resource to be processed and analysed for knowledge or information extraction that enables support for cost- savings and decision making. Data mining applications in healthcare sector can be combined as the evaluation into broad categories.
1.3. Uses of Data Mining in Health Care:
In health care industries dependence on data is increasing day by day. In medical science, analysis of any infection and treatment of patients is the most imperative undertaking. The manually written notes have been changed over to electronic records with a point of diminishing expense brought about amid treatment and enhance effectiveness of the treatment. Data mining applications in social insurance can be additionally isolated into following classifications:
a. Diagnosis and Prediction of Diseases: When it comes to social insurance businesses, conclusion and anticipation of ailments is imperative, it is a standout amongst the most imperative motivation behind utilizing information digging for social insurance. Utilization of information digging for human services has helped specialists to enhance the wellbeing administrations gave by them. One can't sit idle and cash by picking some off base treatment for a patient, which can likewise hurt patient's wellbeing.
b. Ranking of Various Hospitals: Data mining strategies are utilized to think about every one of the points of interest of different healing centres keeping in mind the end goal to rank them. Associations rank different healing centres based on their ability to deal with patients with genuine disease, i.e., healing centres with a higher rank are more reasonable for taking care of high– hazard patients, as it is their most elevated need though this isn't the situation in bring down positioned clinics since they don't considerably consider the hazard factor.
c. Better Treatment Techniques: With the assistance of information mining procedures, both the specialist and patient can pick the best treatment choice by looking at among all the treatment systems. They can choose the best treatment systems both as far as adequacy and cost.
Through information mining they can likewise discover the reactions of different medications and in this way diminishes hazard to patients.
d. Effective Treatments: By contrasting components like causes, indications, symptoms, and cost of medications information mining is utilized to break down the adequacy of medicines. For instance, one can look at the
discover which treatment is compelling regarding the patient's wellbeing and cost.
e. Infection Control in Hospitals: Doctor's facility diseases influences a huge number of patients consistently and the quantity of contaminations which are sedate safe is extremely high . Investigation for contamination is done through information mining to distinguish some unpredictable examples in the information of disease control.
f. Identifying High Risk Patients: American Health ways helps clinics with diabetes infection administration administrations to enhance the quality and lessen the cost of diabetic patients. To separate between high– chance and low– chance patients, American Health ways utilized prescient displaying procedure. Utilizing prescient displaying strategy, high– chance patients who required more concern with respect to their wellbeing were distinguished by the medicinal services suppliers
g. Proper Hospital Resources Management:
Administration of doctor's facility assets is an imperative assignment in human services businesses. Information digging develops a model for overseeing healing facility assets. Gathering Health Cooperative uses information mining and gives administrations to clinics at a lower cost. Blue Cross oversees illnesses proficiently by diminishing the cost and enhancing the yields with the assistance of information mining.
1.4. Application of Data Mining Techniques in Health Care:
The different classification algorithms mentioned below in figure 1 are used to predict or to analyze various diseases.
Fig. 2. Different Techniques in Health Domain
brain operates. The concept of neural networks is rapidly increasing in popularity in the area of developing trading systems.
b. Decision Tree: It is a structure that includes a root node, branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds a class label.
The topmost node in the tree is the root node.
c. Bayesian Network: It is a graphical model that encodes probabilistic relationships among variables of interest.
When used in conjunction with statistical techniques, the graphical model has several advantages for data modelling.
d. Genetic Algorithm: It is a heuristic search method used in artificial intelligence and computing. It is used for finding optimized solutions to search problems based on
the theory of natural selection
and evolutionary biology. Genetic algorithms are
excellent for searching through large and complex data sets.
e. Naive Bayesian classifiers: It is highly scalable, requiring a number of parameters linear in the number of variables (features/predictors) in a learning problem. Maximum-likelihood training can be done by evaluating a closed-form expression, which takes linear time, rather than by expensive iterative approximation as used for many other types of classifiers.
f. Classification by K-means: It is not designed for classification, but we can adapt it for the purpose of supervised classification. If we use k-means to classify data, there are two schemes. One method used is to separate the data according to class labels and apply k- means to every class separately. If we have two classes, we would perform k-means twice, once for each group of data. At the end, we acquire a set of prototypes for each class.
II.
L
ITERATURER
EVIEWTable 1. Summary of Medical Data Mining Techniques
S.No. Author Health Issues Techniques Applied O/P Found
1 Peter Ghavami, Kailash Kapur
Deep Vein Thrombosis / Pulmonary Embolism (DVT/PE),
Neural Networks [12] Managing Patient Health Complications
2 Mrs J.Cathrin Princy,
Mrs. K. Sivaranjani Asthma Support Vector Machine
(SVM) [8] Predict Asthma
3
Harsh Shrivastava, Vijay Huddar, Sakyajit Bhattacharya
Respiratory Failure Similarity Based Classifier [4]
Predicting Critical Complication in Intensive Care Units
4
Zahra Shiezadeh, Hedieh Sajedi and Elham Aflakie
Rheumatoid Arthritis Adaboost Classifier Algorithm [20]
Early Diagnosis and Treatment of the Disease
5
Khaled A. S. Abu Daqqa, Ashraf Y. A.
Maghari, Wael F. M. Al Sarraj
Leukemia Decision-Tree Algorithm [9]
Leukemia Existence by
Determining the Relationships of Blood Properties
6 Kavita Choudhary, Pinki Bajaj
Root Canal Treatment(RCT)
Cross Validation And
Decision Tree [11] Detect RCT
7
Jyoti Soni, Ujma Ansari, Dipesh Sharma and Sunita Soni
Heart Diseases
Decision Tree and Bayesian Classification [7]
Heart Disease Prediction
9 Roshan S and Dr.
Rohini V. Lung Cancer Naive Bayes [15]
Predicting the Survival of Lung Cancer Patients After Thoracic Surgery
10. Dr. S. Vijayarani, Mr. S.
Dhayanand Kidney Diseases Naive Bayes and Support
Vector Machine [2] Disease Prediction
III.
C
ONCLUSIONWith the current enormous growth of biomedical data that are assembled in the form of electronics format from various critical care and from the available computerized equipment based on these the researchers are anxious and started to explore these data. In this analysis, we observe data mining Techniques like classification that are used in various medical fields for the analysis of health issues.
Since there are volume of medical data are present data mining techniques are very helpful in classification and prediction of those healthcare issues to identify the disease and their nature. This analysis show case the various data mining techniques that are helpful in the field of medical data to easily analysis the diseases.
IV.
R
EFERENCES[1] Dhanya P Varghese & Tintu PB, “A Survey on Health Data Using Data Mining Techniques”, International Research Journal of Engineering and Technology (IRJET), Volume: 02 Issue: 07, Oct- 2015.
[2] Dr. S. Vijayarani and Mr. S. Dhayanand “Data Mining Classification Algorithms For Kidney Disease Prediction” International Journal on Cybernetics & Informatics (IJCI) Vol. 4, No. 4, August 2015.
[3] Gustavo Santos-Garcia & Gonzalo Varela & Nuria Novoa & Marcelo F. Jimenez, “Prediction Of Postoperative Morbidity After Lung Resection Using An Artificial Neural Network Ensemble”, Artificial Intelligence in Medicine 30:61–69, 2004.
[4] Harsh Shrivastava, Vijay Huddar, Sakyajit Bhattacharya and Vaibhav Rajan “Classification with imbalance: A similarity-based method for predicting respiratory failure” Bioinformatics and Biomedicine (BIBM), IEEE International Conference Nov 2015.
[5] I. Curiac & G. Vasile & O. Banias & C. Volosencu
& A. Albu, “Bayesian Network Model For Diagnosis Of Psychiatric Diseases”, Proceedings of the ITI 2009 31st Int. Conf. on Information Technology Interfaces, Cavtat, Croatia, 22-25 June- 2009.
[6] Illhoi Yoo & Patricia Alafaireet & Miroslav
Healthcare And Biomedicine: A Survey Of The Literature”, Springer, May-2011.
[7] Jyoti Soni ,Ujma Ansari and Dipesh Sharma
“Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction” International Journal of Computer Applications, March 2011.
[8] J. Cathrin Princy and K. Sivaranjani “Survey on Asthma Prediction Using Classification Technique”
International Journal of Computer Science and Mobile Computing, ISSN 2320–088X, July-2016.
[9] Khaled A. S. Abu Daqqa, Ashraf Y. A. Maghari and Wael F. M. Al Sarraj “Prediction and diagnosis of leukemia using classification algorithms” ICIT, May 2017.
[10] K. Sharmila & Dr. S. A. Vethamanickam, “Survey On Data Mining Algorithm And Its Application In Healthcare Sector Using Hadoop Platform”, International Journal of Emerging Technology and Advanced Engineering ISSN 2250-2459, Volume:
05, Issue: 01, January-2015.
[11] Kavita Choudharya and Pinki Bajaja “Automated Prediction of RCT (Root Canal Treatment) Using Data Mining Techniques: ICT in Health Care” April 2015.
[12] Peter Ghavami and Kailash Kapur “Prognostics &
Artificial Neural Network Applications In Patient Healthcare” IEEE Conference Publication, February 2018.
[13] Pradeep Nayak and Sayeesh, “A Survey on Medical Data by Using Data Mining Techniques”, ISSN:2454-132X , Impact Factor:4.295, Vol.3,Issue 6.
[14] Prakash Mahindrakar & Dr. M. Hanumanthappa,
“Data Mining in Health Care: A Survey of Techniques and Algorithms with its Limitations and Challenges”, International Journal of Engineering Research and Applications (IJERA), ISSN: 2248- 9622, Vol. 3, Issue 6, Nov-Dec 2013.
[15] Roshan S and Dr. V.Rohini “Prediction Of Post- Surgical Survival Of Lung Cancer Patients After Thoracic Surgery Using Data Mining Techniques”
IJAR, April 2017.
International Journal of Information Sciences and Techniques (IJIST) Vol.6, No.1/2, March 2016.
[17] Syed Zahid Hassan & Brijesh Verma, “A Hybrid Data Mining Approach For Knowledge Extraction And Classification In Medical Databases”. IEEE.
P1-6, 2007.
[18] Shadma Qureshi, Sonal Raj and Shiv Kumar,
“Mining Social Media Data For Understading Drugs Usage”, International Research Journal of Engineering and Technology(IRJET) Vol.4 Issue:07
| July 2017.
[19] Yanwei Xing & Jie Wang & Zhihong Zhao &
Yonghong Gao, “Combination Data Mining Methods With New Medical Data To Predicting Outcome Of Coronary Heart Disease”, International Conference on Convergence Information Technology, 2007.
[20] Zahra Shiezadeh, Hedieh Sajedi and Elham Aflakie
“Diagnosis of Rheumatoid Arthritis Using An Ensemble Learning Approach” ICAITA 2015.