ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 3, March 2016
An Automatic Clinical Diagnosis System
Based on Naïve Basyesian Classification
N. Senthamilarasi1, S.Preethy2,S.Revathy3, D.Preethi4
Assistant Professor, Dept. of IT, Panimalar Institute of Technology, Chennai, India1 Students, Dept. of IT, Panimalar Institute of Technology, Chennai, India2,3,4
ABSTRACT: Clinical diagnosis support system , which uses data mining techniques to help clinician to make decisions. The advantage of clinical diagnosis support system is that it reduce the diagnosis time and give accurate report. Naïve Bayesian classification can be used to get valuable information. In this paper, we propose a new automatic clinical diagnosis system based on Naïve Bayesian classification this system will help the clinician to diagnosis the risk of patients’ disease in preserving way. In this system, the past patients’ details are stored in cloud and Naïve Bayesian classifier is used to create a trained set. We use AES cryptosystem to give privacy to the data stored in cloud.
KEYWORDS: Data Mining , Naïve Bayesian , AES algorithm.
I. INTRODUCTION
Health care industry, abundantly broadcast in the global scope to provide health care and health care services to patients’, has never faced such a such a growth in technological side. . Greater storage is required to store the health care data. Data mining has more potential for the healthcare industry to enable health systems to automatically analyze and provide security to the historic data which is stored in cloud. Over the past few years, there massive improvement in data mining technique has a major impact on human’s lifestyle by predicting behaviors and future trends. To reduce the diagnosis time and to improve the accuracy, a new diagnosis system should be developed to provide diagnosis in a faster way.
Naïve Bayesian classifier is one of the popular machine learning tools[5], has widely used in healthcare industry to predict various diseases. It is more appropriate for medical diagnosis in healthcare than complex techniques. Privacy Preserving Patient-Centric Clinical Decision Support System, called PPCD, which based on Naïve Bayesian
classification to help doctor to predict disease risks of patients and provide privacy using Paillier encryption
techniques[1]. PPCD with Naïve Bayesian classifier has offered many advantages and opens a new way to predict patient’s diseases. PPCD does not use an advanced encryption method and the Paillier encryption does not achieve multiplication of the plaintext so it uses Secure Multiplication (SM) protocol[1].
In this paper, to address the privacy issues lying in Privacy Preserving Patient-Centric Clinical Decision
Support System, we propose an Automatic ClinicalDiagnosis System, which is based on Naïve Bayesian classification
ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 3, March 2016
Fig1. Login page
Automatic clinical diagnosis system which allows service provider to diagnosis patient’s diseases and provide privacy to the patient’s medical data. Naïve Bayesian classifier is used to create a trained set. By using this trained set service provider can diagnosis patient’s diseases according to their symptoms in a secure way. Finally, patient’s can get the diagnosed report according to their own preference privately without affecting the service provider’s privacy.
II. RELATED WORKS
Providing privacy to the patient’s historical medical data is a for call. In previously developed system Paillier encryption techniques which does not achieve additive multiplication of plaintext so there is a need of Secure Multiplication protocol which serve as the basis of the PPCD but it is not that efficient and advanced methodology.
Algorithm 1: PRIVACY-PRESERVING MAXIMUM OUT OF n PROTOCOL (PMAXn)
Input: CP has nd tuples T1,…,Tnd , PA holds private key
SKc.
Output: the maximum tuple TU among T1,…..,Tnd.
1. Initialize set S such that Sb = {T1 ,….,Tnd}.
2. for i = 1 to [log2nd] do
3. initialize Sa such that Sa = 0.
4. for j € └| |┘do
5. calculate T'j = PMAX(T2j-1, T2j).
6. add T'j to set Sa.
7. Set Sa = Sb, .Sb contains only one
element Tu.
8. Return Tu.
ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 3, March 2016
Algorithm 2: PRIVACY-PRESERVING TOP-k DISEASE NAMES RETRIEVAL PROTOCOL (TOP-K)
Input : CP has d cipher text T1, . . ., Tnd, (k< d), PA holds private key SKc Output : CP can got top-k disease names.
1. Initialize set S’a as S’a = { T1, . . ., Tnd} and calculate PIDJ = EPKc (0).
2. For i = 1 to k do
3. Run T MAX = PMAX n(T1 , . . . ,Tnd,)to get tuple TMAX with maximum probability,
where T1, . . . , Tnd, S’a.
4. For j = 1 to nd do
5. Randomly choose RJ ZN, calculate
Vj = (EPKc (HMAX) EPKc (Hj)N-1)Rj.
6. Permute nd encrypted data using πi denote as Vπi(j), send
Vπi(j) to PA
7. (@PA): Decrypt Vπi(j)and by using SKcand denote as βj = DSKc (Vπi(j)).
8. If βj = 0 then
9. denote Aπi(j) = EPKc (0) and Bπi(j) = EPKc (1).
10. else
11. denote Aπi(j) = EPKc (1) and Bπi(j) = EPKc (0).
12. Send Aπi(j) Bπi(j) back to CP.
13. (@CP) get Aj, Bjby using permutation πi1 .
14. Refresh PIDj and EPKc(Hj) Sa by using SM protocol.
PIDj = PIDj . SM (EPKc(IDj), Bj) EPKc (Hj) = SM (EPKc (Hj), Aj)
15. Return PIDjj = 1, . . . , nd to PA.
So in our proposed system we use an advanced techniques i.e. Advanced Encryption Standard (AES). AES works on all type of processor and it is more secure than other encryption techniques.AES is safer to work both in hardware and software. The input to encryption and decryption algorithm is a single 128 bit block. Four different stages are used one for permutation and three for substitution.
III. SYSTEM DESCRIPTION
ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 3, March 2016
1. Trusted Authority –
Trusted Authority is a significant entity by which all other entity in the system is trusted by that one.
2. Patient –
Patient has some evidence information which is collected during doctor visits or provided by patient. Based on the symptoms reports will be generated.
3. Hospital –
Hospital holds old patient’s medical details, which contains symptoms about the patient’s disease. These information are classified by Naïve Bayesian classifier and after classification it create a trained set.
4. Cloud server –
Cloud server can accommodate limitless storage and space which all the data are stored and managed in the system. The parties who have finite storage space can utilize the cloud server for storing their data.
IV. PROPOSED METHODOLOGY
The Algorithm is developed to design an automatic clinical decision system, which will to help doctor to predict disease risks of patients and provide privacy for the historical data.
We use two algorithm in our system. One is Naïve Bayesian and another is Advanced Encryption Standard. Naïve Bayesian belongs to a group of statistical techniques that are called 'supervised classification' as opposed to 'unsupervised classification.' In 'supervised classification' the algorithms are told about two or more classes to which texts have previously been assigned.
Advanced encryption standard is also known as Rijndeal. It is based on Rijndeal ciphers. Rijndeal is a family of ciphers with different key and block sizes.
ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 3, March 2016
A. Advanced encryption standard (AES)-
AES algorithm is used to provide privacy. AES is based on a design principle known as substitution permutation network. AES was designed to be efficient in both hardware and software. AES operates on a 4X4 column order matrix of bytes. AES calculations are done in a special finite field. The AES cipher is specified as a number of repetitions of transformation rounds that convert the plain text into cipher text. Each round consist of several processing steps.
It is fast in both software and hardware. Unlike its predecessor, DES, AES does not use a Festal network.AES has a fixed block size of 128 bits and a key size of 128, 192, or 256 bits. The block size has a maximum of 256 bits.
The algorithm used by AES is a symmetric key algorithm meaning the same key is used both for encryption and decryption of the data.
AddRoundKey(state, w[0, Nb-1])
For round = 1 step 1 to Nr-1
SubBytes(state) ShiftRows(state) MixColumns(state)
AddRoundKey(state, w[round*Nb(round+1)*Nb-1])
end for
SubBytes(state) ShiftRows(state)
AddRoundKey(state, w[Nr*Nb, (round+1)*Nb-1])
ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 3, March 2016
A set of reverse rounds are applied to transform cipher text back into the original plaintext using the same encryption key.
Over all description of the AES algorithm is given below:
1. Key Expansion -
Round keys are derived from the cipher key using Rijndael’s schedule.
2. Initial round –
AddRoundKey — each byte of the state is combined with the round key.
3. Rounds –
1. Sub Bytes — a non-linear substitution step where each byte is replaced with another according to a lookup table.
2. ShiftRows — a transposition step where each row of the state is shifted cyclically a certain number of steps.
3. MixColumns — a mixing operation which operates on the columns of the state, combining the four bytes in each column.
4. AddRoundKey.
4. Final Round (no MixColumns) -
1 .SubBytes
2. ShiftRows 3. AddRoundKey.
B. Naïve Bayesian-
A Naïve Bayesian classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. It is a simple technique for constructing classifier.
A more expressive term for the underlying probability model would be "independent feature model".
Naïve Bayesian belongs to a group of statistical methods that are called supervised classification. In 'supervised classification' the algorithms are told about two or more classes to which texts have previously been assigned.
The Naïve Bayesian classifier is a simple form of Bayesian classifiers which assumes all the features are independent of each other.
ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 3, March 2016
ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 3, March 2016
In simple terms, a Naïve Bayesian classifier assumes that the presence of a particular feature of a class is unrelated to the presence of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of the other features, a Naïve Bayesian classifier considers all of these properties to independently contribute to the probability that this fruit is an apple. Naïve Bayesian models uses the method of maximum likelihood.
Naïve Bayesian classifier requires only a small amount of training data to estimate the parameters necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.
ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 3, March 2016
There are four modules involved. They are:
1. Health data collection:
The company stores its encrypted monitoring data or program in the cloud. Individual clients collect their medical data and store them in their mobile devices, which then transform the data into attribute vectors. The attribute vectors are delivered as inputs to the monitoring program in the cloud through a mobile (or smart) phone. TA is responsible for distributing private keys to clients and collecting service fees from clients according to a certain business model such as “pay-per-use” model.
2. AES implementation:
To protect the client’s privacy, we apply the anonymous AES in medical diagnostic branching programs. To reduce the decryption complexity due to the use of AES, we apply recently proposed decryption outsourcing with privacy protection to shift client’s pairing computation to the cloud server.
3. Token generation:
To generate the private key for the attribute vector, a client first computes the identity representation set of each element in and delivers all the identity representation sets to TA. Then TA runs the on each identity in the identity set and delivers all the respective private keys to the client.
4. Cipher text retrieval:
The cloud is required to generate the cipher texts for clients by running the Re Encryption algorithm. Each run of Re Encryption algorithm costs the cloud exactly two pairing computations. For each client, the cloud needs to perform those Computations. The resulting public key cipher texts along with the original symmetric key cipher texts constitute the Cipher text sets for the client.
V. CONCLUSION
In this paper, we have proposed an automatic clinical diagnosis system using Na¨ıve Bayesian
classifier. By taking the advantage of emerging data mining technique, processing unit can use big medical dataset stored in cloud platform to train Na¨ıve Bayesian classifier, and then apply the classifier for disease diagnosis without
compromising the privacy of data provider. In addition, we apply privacy to the data which is stored in cloud by using AES cryptosystem.
REFERENCES
[1] Privacy-Preserving Patient-Centric Clinical Decision Support System on Na¨ıve Bayesian Classification Ximeng Liu, Student Member, IEEE,
Rongxing Lu, Member, IEEE, Jianfeng Ma, Member, IEEE, Le Chen, and Baodong Qin
[2] R. S. Ledley and L. B. Lusted, “Reasoning foundations of medical diagnosis,” Science, vol. 130, no. 3366, pp. 9–21, 1959.
[3] H. R. Warner, A. F. Toronto, L. G. Veasey, and R. Stephenson, “A mathematical approach to medical diagnosis: application to congenital heart disease,” Jama, vol. 177, no. 3, pp. 177–183, 1961.
[4] C. Schurink, P. Lucas, I. Hoepelman, and M. Bonten, “Computerassisted decision support for the diagnosis and treatment of infectious diseases in intensive care units,” The Lancet infectious diseases, vol. 5,no. 5, pp. 305–312, 2005.
[5] M. Kantarcıoglu, J. Vaidya, and C. Clifton, “Privacy preserving naivebayes classifier for horizontally partitioned data,” in IEEE ICDM workshop on privacy preserving data mining, 2003, pp. 3–9.
[6] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu, “Tools forprivacy preserving distributed data mining,” ACM SIGKDD ExplorationsNewsletter, vol. 4, no. 2, pp. 28–34, 2002.
[7] X. Yi and Y. Zhang, “Privacy-preserving naive bayes classification ondistributed data via semi-trusted mixers,” Information Systems, vol. 34, no. 3, pp. 371–380, 2009.
[8] A. Amirbekyan and V. Estivill-Castro, “A new efficient privacypreservingscalar product protocol,” in Proceedings of the sixth Australasian conference on Data mining and analytics-Volume 70. Australian Computer Society, Inc., 2007, pp. 209–214.
ISSN(Online): 2320-9801
ISSN (Print) : 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 3, March 2016
[10] J. Vaidya, M. Kantarcioglu, and C. Clifton, “Privacy-preserving na¨ıve bayes classification,” VLDB J., vol. 17, no. 4, pp. 879–898, 2008. [11] A. Abbas and S. U. Khan, “A review on the state-of-the-art privacypreserving approaches in the e-health clouds,” IEEE J. Biomedical and Health Informatics, vol. 18, no. 4, pp. 1431–1441, 2014.
[12] Y. Tong, J. Sun, S. S. M. Chow, and P. Li, “Cloud-assisted mobile-access of health data with privacy and auditability,” IEEE J. Biomedical and Health Informatics, vol. 18, no. 2, pp. 419–429, 2014.