Prediction of Chronic Kidney Disease Using Machine Learning Techniques

(1)

Prediction of Chronic Kidney Disease Using Machine Learning Techniques

Mrs Prasuna Kotturu¹ , Mr VVS Sasank ²,G Supriya³ , Ch Sai Manoj⁴ , M V Maheshwarredy⁵

1Assistant Professor, Dept of CSE, KLEF, Guntur, Andhra Pradesh, India.

2Assistant Professor, Dept of CSE, KLEF, Guntur, Andhra Pradesh, India.

3,4,5

B.Tech Student, Dept of CSE, KLEF, Guntur, Andhra Pradesh, India

1 [email protected], ²[email protected],

3 [email protected] , [email protected]

5[email protected]

Abstract

Chronic kidney disease (CKD) is a hazardous disease effecting many people worldwide.

Individuals with chronic kidney disease (CKD) are often unaware that the medical tests they undergo may provide useful information about CKD for other purposes and this information may not be used effectively to address disease diagnosis. The major problem of this disease is it is hard to recognize till it reaches advanced stage. In this paper we are predicting chronic kidney disease(CKD) using machine learning techniques .In this paper , we are using machine learning algorithms like decision tree, naïve Bayes classification, logistic regression(LR), support vector machine(SVM) and random forest In this paper we detect the chronic kidney disease (CKD) using the best suited method and got 99.3% as the most accurate result using random forest method.

Keywords: Chronic Kidney Disease, Random forest, Logistic Regression, Support Vector Machine, Naive Bayes, Decision Tree.

1. Introduction

Chronic Kidney Disease(CKD): : Chronic Kidney Disease (CKD) is a hazardous disease effecting many people worldwide. The key problem of being affected by this disease is that it is hard to identify the disease until it reaches an advanced stage. In the world the most decision making problem is the classification problem it plays a key role in the wide range of machine learning problems.

The aim of classification problems is to find problem solution of pre-scribing into the classes pre-diffing according to the number of linked attributes to the corresponding object. Where mild means less damage and can be treated with ease, whereas severe cases can lead to death if not taken care.

Machine learning strategies have indicated guarantee from the perspective of medical science in anticipating and diagnosing different basic diseases. They are essentially utilized in restorative analysis for medical diagnosis on basic choices, as the information in the medicinal field is colossal and the exactness of the finding relies upon and the exactness of the finding relies upon thinking about

(2)

the enormous information of the patients. ML calculations have been a fundamental force in acknowledgment of varieties from the standard in different physiological data, and are, with an exceptional accomplishment, used in different request undertakings. ML improves the exactness of the ailment conclusion. So as to distinguish this sickness, we will utilize ML to force the result with the indications and significant elements

2. Literature Survey

YEAR AUTHOR Best Classifier Accuracy

2009 Y. Chen et al[1]

Logistic Regression (LR) is the mo

del for linear regression 96.4

2010 R. G. Brereton et al[2] Support vector machine(svm)

98.2

2015 J. R.Quinlan et al[3] Decision tree classification 94.3

2015

Paul E. Stevens et al[4] Decision tree classification 94.6

2018 W. Shan Lee et al[5] Random forest Classifier 99.3

2018 T. Sajana et al [6] Random Forest Classifier 99.4

2018 T. Sajana et al [7] Naïve Bayes 95.1

2019 M.R.Narasingarao et al [8] Naïve Bayes 94.8

 Y.Chen et al[1] Logistic Regression (LR) is a linear regression model. LR calculates the P(X / Y) distribution between the X instance and the Boolean tag.

 R. G. Brereton et al[2] support vector machine (SVM) is the most common data mining method used for data prediction. classification. SVM's main idea is to find the optimal hyperplane in the training data between data from two classes

 J.R. Quinlan's t al[3] Decision tree is the most popular classification system used in data mining activities. A decision tree is a representation of the root node, branches and leaf nodes. It divides the data into classes according to the value of the test attribute.

 E.PaulStevensetal[4]:Nature Reviews Nephrology Early detection of CKD' s advantages and impacts on prognosis‖, Nephrology of nature reviews

 W.Shan Lee,et al[5] ―Compression and Aggregation for Logistic Regression Analysis in Data Cubes logistic regression (LR), and decision tree classifiers to predict chronic kidney disease.

 T. sajana et al[6] presented a survey on prediction of malaria disease based on life stage of parasites, segmentation of erthrocytes

(3)

 T. Sajana et al[7] aims to suggest a comparative study to identify malaria imbalance. Data using the Naive Bayesian classification in different languageenvironments.using the Naive Bayesian classifier in various Rlan guage environment

 M.R.Narasinga Rao et al[8] presented the balanced class distribution of dat a on imbalanced malaria by using the SMOTE algorithm and then conduct ed a Naive Bayesian Classifier comparative study on various platforms for better prediction.

3. Analyis of CKD

These go about as ground-breaking and exact channels to liberate the body from squander and perilous substances and return supplements, amino acids, insulin, hormones and other basic substances to the circulatory framework. Incidentally things can turn out gravely, however. "Constant kidney sickness" (CKD) is utilized all through the world to allude to any type of kidney disease that proceeds For in excess of a couple of months. "Constant" doesn't generally signify

"genuine" and "infection" incorporates any deviation from the kidney structure or limit standard, paying little heed to whether it is probably going to make a man feel unwell or produce complexities. It's a typical issue that can influence anybody at any age, however the further developed you are, the more probable you are. To come clean, it is assessed that almost 3 million individuals in the UK are in danger of making CKD. There are a couple of phases of constant kidney illness, From the delicate loss of kidney capacity to the disappointment of the kidneys, not all cases of CKD advance to the truest degree. Most people fall into the smooth for specific groupings. A combination of different conditions that often put a strain on the kidneys works on CKD. For example: Ineffectively controlled circulatory stress Diabetes, you are slowly in extreme danger of developing CKD in case you have cardiovascular sickness (heart or vein conditions) with a family ancestry of kidney infection .

4. Machine Learning Techniques

4.1 Support Vector Machine

. For both relapse and grouping undertakings, Support Vector Machine curtailed as SVM can be used. Numerous individuals favour it profoundly as it provides incredible accuracy with less power of measurement. Whatever it may be, it is generally used for description purposes. For ML, SVM support vector systems are supervised computer models compatible with learning counts that look at data used for request and backslide Test.An SVM planning figure produces a process when set as a position with two categories that distributes new consultants to one or the other category, making it a non-probabilistic matched straight classifier. An SVM model is a representation of the models as space centres, mapped to isolate the resources of the different classes through a responsive opening that is as large as the normal situation. New models are mapped in this proportionate region and Foreseen to have a class based position on the side of the void they fall through.

Despite direct representation, SVMs can use what is known as the segment trick to

(4)

play a non-straight course of action, simply projecting their obligations to high- dimensional component spaces.

4.2 Random Forest

Random Forest are a blend of tree indicators so all trees are self-sufficiently examined with a similar appropriation for every single tree contingent upon the estimations of an irregular vector. The prescient or characterization task of arbitrary calculation can be portrayed as pursues: utilizing unique examples to draw information, n tree bootstrap . An unpruned arrangement tree for each bootstrap test is created., Following alteration at every node as opposed to picking the best split between all indicators discretionarily look for one of the indicators and select the best split between those factors. Foresee new information by accumulating n tree trees expectations utilizing greater part votes to order.

4.3 Logistic Regression

Logistic Regression is a calculation for order. As per a lot of autonomous factors (1/0, Yes/No, True/False), it is utilized to assess a paired answer. To see paired/clear cut checks, we utilize factors. Calculated regression can likewise be considered as an exceptional instance of direct relapse when the result variable is absolute, where we utilize the probability log as a needy variable

4.4 Decision Tree

A Decision tree is a flowchart-like structure in which each inward hub speaks to a

"test" of a function, each branch speaks to the test result, and each leaf hub speaks to a class mark. Decision trees are usually used in the exploration of tasks, specifically in the analysis of choice, to help identify a process well on the way to achieve a target.

4.5 Naïve Bayes

Naïve Bayes is a classification algorithms based on Bayes theorem. Bayes’

Theorem finds the probability of an event occurring given the probability of another event that has already occurred. In this method, each feature contributes independent and equal contribution to the results.

5. Data Sets and Attributes

In this paper, we have downloaded the dataset from 2015 UCI Machine Learning Repository called Chronic Kidney Disease for study. This dataset was gathered roughly 2 months from the Apollo emergency clinic and has 25 attributes, 11 numeric and 14 boolean Attributes and Data types are Age, Blood Pressure, Sugar, Bacteria, Glucose of the blood, Blood urea, Hemoglobin, Red Blood Cell Count, high blood pressure, Mellitus diabetes, Disease of coronary artery, Appetite, Anemia

(5)

6. METHODOLOGY 6.1 Random Forest

In this paper ,we have taken our dataset from 2015 UCI Machine Learning Repository called Chronic Kidney Disease for study. In this dataset there are 400 instances categorized using 25 attributes . out of these 25 attributes there are 11 numeric and 14 Boolean attributes .in this paper we used 244 instances for training purpose and 156 instances for testing purpose. Pre-processing is performed on Data set to handle noisy and missing data. By this method we obtained an accuracy of 99.3%.

6.2 Support Vector Machine

After pre-processing of the dataset, by using SVM, we classified the data into 250 ceaseless kidney illness instances and 150 instances of non-interminable kidney infections. The accuracy of this method is 98.2%.

6.3 Logistic Regression

By using logistic regression we measured the possibility of the case falling under chronic kidney disease. We obtained an accuracy of 96.5% using logistic regression.

6.4 Naive Bayes

By predicting the probabilities for each class in the dataset to that of the existing record belongs to a particular class, highest probability class is chosen. By using this method we obtained an accuracy of 94.8%

6.5 Decision Tree

By using decision tree algorithm we classified the data into two classes’ namely ceaseless kidney instances and non-interminable kidney infections. The accuracy of decision tree classifier is 94.6%.

7. RESULTS

7.1 Comparative study

All out 400 data instances are utilized for expectation calculation learning, of which 250 are named ceaseless kidney illness (CKD) and 150 are named non- interminable kidney infection (NCKD).After implementing these classifiers we got an accuracy of 94.8% over decision tree,95% accuracy over Naïve Bayes,96.5% over logistic regression,98.3% accuracy over support vector machines,99.3% over random forest .These values are graphically shown in comparative graph in figure[1] .Among these classifiers random forest has the highest accuracy over all the other classifiers, so we chose random forest as our classification technique. From our Analysis we obtained an accuracy of 99.3% by using random forest technique and 96.5% accuracy using logistic regression, making random forest tree the best suited technique.

(6)

Table 2. Comparative study of ML classifiers Machine learning

method

Accuracy

Random Forest 99.3

Support Vector

Machines

98.3

Logistic Regression 96.5

Naïve Bayes 95

Decision Tree 94.8

7.2 Comparative Graph of Machine Learning Classifiers

The Accuracy of Machine Learning classifiers are graphically represented in the below figure[1]. By using the ML Techniques the obtained accuracy results are of 94.8% by using decision tree,95% accuracy by using Naïve Bayes,96.5% by using logistic regression,98.3% accuracy by using support vector machines,99.3% by using random forest .

Figure 1. Comparative Graph of Machine Learning Classifiers

8. Conclusion

In this paper we detect the chronic kidney disease (CKD) using machine learning algorithms .The supervised machine learning algorithms used in this paper are random forest, support vector machine(SVM),linear

(7)

regression(LR),decision tree and naïve Bayes classifier. after comparing the results obtained by using these methods we found that random forest technique is the best suited technique for detection . in our paper we obtained 99.3% using random forest for detection of chronic kidney disease(CKD).

9. References

1. L. J. Rubini and j.rquinlan, ―Generating comparative analysis of early stage prediction of chronic kidney disease,‖ Int. J. Mod. Eng. Res., vol. 50, pp. 49–

55, Jul. 2015.

2. C. Ho, T. Pai, Y. Peng, C. Lee, Y. Chen, Y. Chen, ―Ultrasonography Image Analysis for Detection and Classification of Chronic Kidney Disease,‖ IEEE Complex, Intelligent and Software Intensive Systems, pp. 624 – 629, July 2012.

3. Y. Yang, Paul E Stevens, and Y. Chen, ―Blood disorders typically associated with renal transplantation,‖ Frontiers in cell and developmental biology, vol.

3, 2015.

4. Brereton, R. G., & Lloyd, G. R. (2010). Support vector machines for classification and regression. Analyst, 135(2), 230-267.

5. Jun-Wei Hsieh, C.-Hung Lee, Y.-Chih Chen, W.-Shan Lee, H.-Fen Chiang,

―Stage Classification in Chronic Kidney Disease by Ultrasound Image,‖

International Conference on Image and Vision Computing New Zealand, ACM, pp. 271-276, 2014.

6. P. Sinha and P. Sinha, ―Comparative study of chronic kidney disease prediction using knn and svm,‖ International Journal of Engineering Research and Technology, vol. 4, no. 12, 2015.

7. Sajana, T., &Narasingarao, M. R. (2018). Classification of Imbalanced Malaria Disease Using Naïve Bayesian Algorithm. International Journal of Engineering & Technology, 7(2.7), 786-790.

8. Prasuna, K., Rama Rao ,K.V.S.N., Saibaba, C.H.M.H. (2019). Application of machine learning techniques in predicting breast cancer – A survey.International Journal of Innovative Technology and Exploring Engineering(IJITEE), Volume 8, Issue 8, June 2019, Pages 826-832

9. \Uddaraju, S., &Narasingarao, M. R. (2019). Predicting the Ductal Carcinoma Using Machine Learning Techniques—A Comparison. Journal of Computational and Theoretical Nanoscience, 16(5-6), 1902-1907.

10. L. Zhang et al., ―Trends in chronic kidney disease in China,‖ New England J.

Med., vol. 375, no. 9, pp. 905–906, 2016.

11. S Pradeep , Dr Yogesh Kumar Sharma ,‖ Deep Learning based Real Object Recognition for Security in Air Defence‖ ,IEEE proceedings DOI: 978-93- 80544-32-8, pp:64-67, 2019

12. V. A. Moyer, ―Screening for chronic kidney disease: Us preventive services task force recommendation statement,‖ Annals of internal medicine, vol. 157, no. 8, pp. 567–570, 2012.

13. L. C. Plantinga, L. E. Boulware, J. Coresh, L. A. Stevens, E. R. Miller, R.

Saran, K. L. Messer, A. S. Levey, and N. R. Powe, ―Patient awareness of chronic kidney disease: trends and predictors,‖ Archives of internal medicine, vol. 168, no. 20, pp. 2268–2275, 2008

(8)

14. Dubey, ―A classification of ckd cases using multivariate k-means clustering.‖

International Journal of Scientific and Research Publications (IJSRP), vol. 5, August 2015.

15. ―Walk-in-lab,‖ http://www.walkinlab.com/kidney-tests/albumin.html, accessed: 2016-3-11.

16. ―Health testing centres: Anaemia,‖ http://goo.gl/Xxj43u, accessed: 2016- 3- 11.

Authors

Mrs Prasuna Kotturu is working as an Assistant Professor, Department of Computer Science and Engineering, Koneru Lakshmaiah Educational Foundation, Guntur, A.P, India.. She has 9+ years of experience in academics and industry. Her research interests include Machine learning, Wireless sensor networks, Network security and Data Science.

VVS SASANK, Assistant Professor, Department of Computer Science and Engineering, Koneru Lakshmaiah Educational Foundation, Guntur, A.P, India. Had 3+ years of experience in academics. His research interests include Machine learning and Network security.