International Journal of Emerging Technology and Innovative Engineering Volume 5, Issue 5, May 2019 (ISSN: 2394 – 6598)
267
COMPARISON OF CLASSIFICATION
TECHNIQUES ON DATA MINING
S.Nageswari
Department of Computer Science, Bharath Niketan Engineering College Theni, India; Email: [email protected]
Dr Pallavi M Goel Associate Professor
School of Computing Science and Engineering, Galgotias University Greater Noida, Uttar Pradesh, India
[email protected] P.Divya
Department of Computer Science Engineering, CMS college of Engineering Namakkal, India; Email : [email protected]
Abstract – Educational Data Mining is applying mining techniques to learning-related data. Predicting student performance is the complicated one because of the huge amount of records in learning field. Now a day there is lack of existing survey to get the clear view about predictions.
There are two factors involve in this process such as attributes for prediction and prediction methods. The core aim of this paper is to predict the student’s performance by using the idea of mining methods. In this paper, we compared the accuracy percentage with different data mining methods such as Decision Tree, Neural Network, Naive Bayes, K- Nearest Neighbor, and Support Vector Machine. Among these techniques, Decision Tree and Neural Network provide the best accuracy.
Keywords— Classification Technique, Educational Data Mining, Decision Tree, Neural Network.
I. INTRODUCTION
Student’s performance prediction is an important part of learning field. This process leads to achieve the excellent record in academic. Usamah et al. (2013) stated that performance of students is able to improve by calculating the assessment learning and co-activities [1-9]. The measurement is necessary to calculate the students learning level. The ending grades are used to evaluate the performance of the student’s. End grades are established on the structure of the course, marks, end exam marks and also other activities.
Evaluation of maintaining the performance of student’s and learning process effectiveness is very important. Analysing the performance of student’s by using mining techniques also important. This technique is the most important in analysing student’s performance. And also it has been broadly applied in learning area in recent times [10-24]. It is known as educational mining. It is a process used for mining the information and patterns commencing a large educational
record. Prognosticating the performance is very consequential to amend the quality of learning adeptness of the students.
The next section describes the attributes and methods to be used for this comparison. In section 3, it describes the factors used in mining. In section 4, the detail on the subsisting prognostication methods with its prognostication precision is discussed. Lastly, the conclusion is in section 5.
II. ATTRIBUTESANDMETHODS
For this comparison we searched databases: IEEE Xplore, Springer Link, Science Direct, ACM digital Library. In previous work of predicting student’s performance some attributes used with specific methods are depicted in Table I.
TABLE I. COMMONATTRIBUTESANDMETHODS
USEDTOPREDICTSTUDENTSPERFORMANCE S.
No. ATTRIBUTES METHODS
1 Internal assessments Decision Tree Neural Network K-Nearest Neighbor 2 Internal assessments,
CGPA Support Vector Machine
3
Internal assessments, CGPA, Extra-curricular activities
Decision Tree Naive Bayes K-Nearest Neighbor Support Vector Machine 4
Internal assessments, CGPA, Student Demographic
Decision Tree Naive Bayes K-Nearest Neighbor 5 Internal assessments,
External assessment Neural Network
6
External assessment, Demographic of Student, Background of High School
Neural Network Decision Tree
7 Psychometric factors Decision Tree K-Nearest Neighbor Support Vector Machine 8 External assessments Decision Tree
Naive Bayes
9 CGPA Decision Tree
Neural Network
10
CGPA, Demographic of Student, Background of High School, Scholarship, Interaction with Social Network
Decision Tree Neural Network Naive Bayes
11
CGPA, Demographic of Student, Background of High School, Scholarship, Interaction with Social Network, Internal Assessment, Extra- curricular activities
Decision Tree
12
Demographic of Student, Background of High School.
Neural Network
13
External assessment, CGPA, Demographic of Student, Extra-curricular activities
Decision Tree
14
Psychometric factors, Extra-curricular activities, soft skills
Decision Tree
15
Demographic of Student, Background of High School, Internal assessment, Extra- curricular activities
Decision Tree
16
Internal assessments, External assessment, Demographic, Extra- curricular activities
Decision Tree Neural Network
III. PREDICTIONOFSTUDENTS PERFORMANCE
Factors for soothsaying students’ performance are attributes and methods. Table 1 gives a detail list of attributes and methods utilized in this predicting the students’ performance.
Primary method is fixated the consequential attributes utilized in this method and next is fixated the indication methods utilized in presaging performance of student’s.
Figure 1 clearly explained about the attributes used for prediction.
3.1 Attributes used in predicting student’s performance
Figure.1 Classification of attributes
There are nearly eight attributes are formed by grouping common classifications.
1. CGPA (Cumulative Grade Point Average) is the mostly used attribute to predict the performance of students.
2. Internal assessments
- Assignment mark, quiz mark, lab mark, class test and attendance
3. Demographic
- Gender, age, background of family, and ill health.
4. External assessments
- Mark obtained in final exam for a particular subject
5. Extra activities
6. Background of High school.
7. Interaction of social network 8. Psychometric factor
- Interest of student, behaviour of study, engage time, and support of family.
3.2 Prediction methods used for student’s performance In educational data mining there are many algorithms for classification techniques had applied to predict the student’s performances. Among the algorithms mostly used are Decision Tree, Neural Networks, Naïve Bayes, K-Nearest Neighbor and Support Vector Machine.
3.2.1 Decision Tree
Most popular method for presage is Decision Tree. Maximum of scholars have utilized this method because of its ease of use and clarity toward denude minute and immensely colossal of data structure and indication of the value [25-33].
This model is understood by all as their perceptive process and this directly converted into the rules (IF-THEN) [1]. In Table 1, there are around twelve (13) papers discussed about usage of Decision Tree and the method to predict the students’ performance. The earlier studies using this method are used to predict the drop out structures of student’s information on their academic performance, to predict the right career for a student based on their developmental outlines and also to predict the semester performance of PG students . The samples of a dataset of final grades [37-39], the Cumulative Grade Point Average (CGPA) and End semester marks . All these datasets were measured and examined to determine the main attributes and factors that may affect their performance. Lastly, the suitable mining procedure will be examined to predict their performance .
Attributes used for
CGPA
Internal assessments
Psychometric factor Demographic
Attributes
External assessments Extra-curricular
activities
High school background
Social interaction network
Marks of assignment, quizzes,
lab work, class test and attendance
Student interest, study behaviour, engage time, and family support Mark obtained in
final exam for a particular subject Gender, age,
family background and disability
The classification methods are related to predicting their performance based on their study. Gray et al. (2014) examined the accurateness of that model to predict beginner’s progress in tertiary learning. This model concentrated on examining the prediction of the academic performance of the students. accurately forecasting student performance, and compare the precision of mining algorithms .
3.2.2 Neural Networks
This is next widespread technique utilized in scholastic mining. The main advantage of this network is it has the facility to discover all possible relations between soothsayer’s variables . This could moreover excellent in detecting without any suspicion even in the complex nonlinear association between dependent and independent variables. So this is the finest analyst method. There are more papers have been published belong to this method. And it additionally added an Artificial Neural Network model to forecast the student’s performance thoroughly . In this attributes examined by this network are student admission data , student’s behavior in the direction of self-regulated learning and educational performance [14-24]. And also this paper presents how data can be preprocessed and amend the precision of the student’s final grade presage model for a particular course[25-38] . The remaining papers are utilizing decision tree additionally . The outcomes of presage precision are summarized in the above Table 1.
3.2.3 Naive Bayes
This is a choice to make a prediction. There are six (6) papers that have used this algorithm to evaluate student’s performance. The aim of all survey papers is to discover the best actual forecast technique for predicting student’s performance by making relationships with other techniques [5, 6, 7, 4, 19, 20]. The result is shown in Table 1.
3.2.4 K-Nearest Neighbor
Four papers studied showed that K-Nearest Neighbor is used with good accuracy. This method had occupied a smaller amount of time to identify the dissimilar levels of students’
performance all levels of learners[39-47] . This method gives the best precision in approximating the detailed design for learner’s development in tertiary inculcation.
3.2.5 Support Vector Machine
This method used supervised learning method for relegation.
There are some existing papers that have this method to presage performance of the student’s. Hamalainen et al.
(2006) had culled this technique because of it suitable for datasets . Sembiring et al. (2011) vocally expressed that this technique has a respectable generalization of faculty and more expeditious than any other methods. This method has learned the highest prognostication precision in finding students in peril of failure .
TABLE II. PREDICTIONACCURACYOF
CLASSIFICATIONTECHNIQUES
Methods Attributes Prediction Accuracy
Paper Reference
Number
Decision Tree
Internal
assessment mark 76% [1]
Factors of
Psychometric 65% [2]
External
assessment mark 85% [3]
CGPA 91% [4]
CGPA,
Demographic of student,
Background of high school, Scholarship, Interaction of social network
73% [5]
Internal
assessment mark, CGPA, Extra- curricular activities
66% [6]
Demographic of Student,
Background of High school.
65%
[7]
Internal
assessment mark, Demographic of student, Extra- curricular activities
90% [8]
External
assessment mark, CGPA,
Demographic of student, Extra- curricular activities
90% [9]
Factors of Psychometric, Extra-curricular activities, soft skills
88% [10]
CGPA 98% [20]
Internal assessments mark, External assessment mark, Demographic of student, Extra- curricular activities
73% [18]
Internal assessments mark, CGPA, Demographic of student
69%
[19]
Neural Network
Internal
assessments mark 81% [11]
Factors of 69% [2]
Psychometric External
assessment mark 97% [12]
CGPA 75% [4]
CGPA,
Demographic of student,
Background of High school, Scholarship, Interaction of Social network
71% [5]
Demographic of student,
Background of High school
72% [7]
External
assessment mark, Demographic of student,
Background of High school
74% [13]
Internal assessments mark, External assessment mark
98% [14]
Internal assessments mark, External assessment mark, Demographic of student, Extra- curricular activities
74% [18]
Naive Bayes
CGPA,
Demographic of student,
Background of High school, Scholarship, Interaction of Social network
76% [5]
Demographic of student, Background of High school.
50% [7]
CGPA 75% [4]
Internal
assessment mark, CGPA, Extra- curricular activities
73% [6]
CGPA 94% [20]
Internal assessments, CGPA,
Demographic of student.
71% [19]
K- Nearest Neighbor
Factors of
Psychometric 69% [2]
Internal
assessment mark, 83% [6]
CGPA, Extra- curricular activities Internal
assessment mark 82% [15]
Internal assessments mark, CGPA, Demographic of student.
62% [19]
Support Vector Machine
Factors of
Psychometric 83% [16]
Internal assessments mark, CGPA, Extra-curricular activities
80% [6]
Internal
assessment mark, CGPA
80% [17]
4. DISCUSSIONS
This comparison is predicated on the maximum accuracy of prognostication approaches and withal the consequential aspects that may affect the performance of the student’s. In figure 2 tells that the prognostication precision that utilizes relegation technique grouped by algorithms for prognosticating student’s performance since 2002 to 2016.
PRECISION is the overall correctness of the model and is planned as the sum of precise relegations separated by the whole number of relegations. By seem to be at the diagram in figure 2, Neural Network and Decision Tree has the maximum calculation accurateness by (98%) followed by Naive Bayes by (94%). Lastly, SVM and K-Nearest Neighbour producing the equal accuracy that is (83%). The outcome on calculation accuracy is dependent on the attributes and also prediction method that were used during the prediction process.
Neural Network gave high accuracy (98%) with combination of internal and external assessments. This method got (97%) with external assessment, (81%) with internal assessments and low accuracy (69%) with psychometric factors.
Decision Tree gave highest prediction accuracy (98%) for CGPA and lowest accuracy (65%) for Student Demographic, High school background and Psychometric factors.
Then next is Naive Bayes with prediction accuracy around (94%) as highest for CGPA and lowest (50%) for Student Demographic, High school background.
K-Nearest Neighbor gave high prediction accuracy (83%) for internal assessment mark, CGPA, Extra-curricular activities and low accuracy (62%) for internal assessments, CGPA, Student Demographic.
Support Vector Machine provided high prediction accuracy (83%) for Psychometric factors.
Figure 2. Prediction accuracy grouped by algorithms CONCLUSION
Prognosticating student’s performance is mainly subsidiary to the educators and learners for amending their cognition and edifying process. This paper has reviewed the earlier studies on soothsaying the student’s performance with sundry data mining methods. Maximum of the scholars have utilized CGPA and assessment mark as data sets for presage. For the presage relegation method is often utilized in scholastic mining area. Among the relegation techniques, Neural Network and Decision Tree are the methods highly utilized by the scholars for presaging performance of the student’s.
This comparison will avail the scholastic system to observe the students performance and can ameliorate.
REFERENCES
[1] Sivaram, M., B. DurgaDevi, and J. Anne Steffi.
"Steganography of two lsb bits." International Journal of Communications and Engineering 1.1 (2012): 2231- 2307.
[2] Sivaram, M., et al. "Exploiting the Local Optima in Genetic Algorithm using Tabu Search." Indian Journal of Science and Technology 12 (2019): 1.
[3] Mohammed, Amin Salih, et al. "DETECTION AND REMOVAL OF BLACK HOLE ATTACK IN MOBILE
AD HOC NETWORKS USING GRP
PROTOCOL." International Journal of Advanced Research in Computer Science 10.6 (2018).
[4] Viswanathan, M., et al. "Security and privacy protection in cloud computing." Journal of Advanced Research in Dynamical and Control Systems (2018):
1704-1710.
[5] Nithya, S., et al. "Intelligent based IoT smart city on traffic control system using raspberry Pi and robust waste management." Journal of Advanced Research in Dynamical and Control Systems, Pages (2018): 765-770.
[6] Dhivakar, B., et al. "Statistical Score Calculation of Information Retrieval Systems using Data Fusion Technique." Computer Science and Engineering 2.5 (2012): 43-5.
[7] Mohammed, Amin Salih, Shahab Wahhab Kareem, and M. Sivaram. "Time series prediction using SRE- NAR and SRE-ADALINE." (2018): 1716-1726.
[8] V. Manikandan, Dr.D. Yuvaraj, Amin Salih Mohammed, V. Porkodi and Dr.M. Sivaram.
“ELECTRICAL ENERGY CONSERVATION AND
ENERGY MANAGEMENT SYSTEM USING
INTERNET OF THINGS.” (2018): 2016-2023.
[9] Porkodi.V, Yuvaraj.D., Mohammed,A.S, Sivaram.M and Manikandan.V. “IoT in Agriculture.” (2018): 1986- 1991.
[10] . Manikandan.V, Mohammed, A.S, Yuvaraj,D., Sivaram.M and Porkod.V. “An Energy Efficient EDM- RAEED Protocol for IoT Based Wireless Sensor Networks.” (2018): 1992-2004.
[11] Dr.Amin Salih Mohammed, Suzan Tahsein Husein, Dr. M.Sivaram, V. Porkodi and V.Manikandan. “4G and 5G Communication Networks Future Analysis” (2019):
343-349.
[12] Abraham, Steffin, Tana Luciya Joji, and D. Yuvaraj.
"Enhancing Vehicle Safety with Drowsiness Detection and Collision Avoidance." International Journal of Pure and Applied Mathematics 120.6 (2018): 2295-2310.
[13] Porkodi, V., et al. "Survey on White-Box Attacks and Solutions." Asian Journal of Computer Science and Technology 7.3 (2018): 28-32.
[14] Malathi, N., and M. Sivaram. "An Enhanced Scheme to Pinpoint Malicious Behavior of Nodes In Manet’s." (2015).
[15] Sivaram, M. "Odd and even point crossover based Tabu ga for data fusion in Information retrieval." (2014).
[16] Sivaram, M., et al. "Emergent News Event Detection from Facebook Using Clustering."
[17] Punidha, R. "avithra K, Swathika R, and Sivaram M,“Preserving DDoS Attacks sing Node Blocking Algorithm.” International Journalof Pure and Applied Mathematics, Vol. 119, o. 15, 2018." 633-640.
[18] Batri, K., and M. Sivaram. "Testing the impact of odd and even point crossover of genetic algorithm over the data fusion in information retrieval." European Journal of Scientific Research (2012).
[19] Mohamme, Sivaram Yuvaraj Amin Salih, and V.
Porkodi. "Estimating the Secret Message in the Digital Image." International Journal of Computer Applications 181.36 (2019): 26-28.
[20] Manikandan, V., et al. "PRIVACY PRESERVING DATA MINING USING THRESHOLD BASED FUZZY CMEANS CLUSTERING." ICTACT Journal on Soft Computing 9.1 (2018).
[21] Obulatha-II-ME-CSE, Miss O. "Position Privacy Using LocX."
[22] Sivaram, M., et al. "The Real Problem Through a Selection Making an Algorithm that Minimizes the Computational Complexity."
[23] Sivaram, M., et al. "DETECTION OF ACCURATE FACIAL DETECTION USING HYBRID DEEP
CONVOLUTIONAL RECURRENT NEURAL
NETWORK."
[24] V. Porkodi, Dr.D. Yuvaraj, Dr. Amin Salih Mohammed, V. Manikandan and Dr.M. Sivaram.
“Prolong the Network Lifespan of WirelessSensor Network” (2018): 2034-2038.
[25] M, Sivaram, ENABLING ANONYMOUS
ENDORSEMENT IN CLOUDS WITH
DECENTRALIZED ACCESS CONTROL (March 13, 2019). Available at SSRN: https://ssrn.com/abstract=
[26] M, Sivaram, INTEGER WAVELET TRANSFORM BASED APPROACH FOR HIGH ROBUSTNESS OF
AUDIO SIGNAL TRANSMISSION (March 13, 2019).
Available at SSRN: https://ssrn.com/abstract=
[27] M, Sivaram, HEALTHCARE VISIBLE LIGHT COMMUNICATION (March 13, 2019). Available at SSRN: https://ssrn.com/abstract=
[28] M, Sivaram, Preserving DDoS Attacks Using Node Blocking Algorithm (March 13, 2019). Available at SSRN: https://ssrn.com/abstract=
[29] M, Sivaram and Sivaram, Porkodi and manikandan, V, Securing the Sensor Networks Along With Secured Routing Protocols for Data Transfer in Wireless Sensor Networks (OCTOBER 28, 2018). Available at SSRN: https://ssrn.com/abstract=
[30] M, Sivaram, Statistical Score Calculation of Information Retrieval Systems using Data Fusion Technique (April 8, 2019). Available at SSRN:
https://ssrn.com/abstract=
[31] Yuvaraj, Duraisamy, and Shanmugasundaram Hariharan. "Content-based image retrieval based on integrating region segmentation and colour histogram."
learning 2.11 (2016): 12.
[32] ShanmugaPriya, S., A. Valarmathi, and D. Yuvaraj.
"The personal authentication service and security enhancement for optimal strong password." Concurrency and Computation: Practice and Experience: e5009.
[33] Sundara Vadivel, P., et al. "An efficient CBIR system based on color histogram, edge, and texture features." Concurrency and Computation: Practice and Experience: e4994.
[34] Ahamed, B. Bazeer, and D. Yuvaraj. "Framework for Faction of Data in Social Network Using Link Based Mining Process." International Conference on Intelligent Computing & Optimization. Springer, Cham, 2018.
[35] Vadivel, P. Sundara, S. Navaneetha Krishnan, and D. Yuvaraj. "AN EFFECTIVE DOCUMENT
CATEGORY PREDICTION SYSTEM USING
SUPPORT VECTOR MACHINES, MANN-WHITNEY TECHNIQUES." International Journal of Pure and Applied Mathematics 118.22 (2018): 895-900.
[36] Santhi, R., and D. Yuvaraj. "Content-Based Image Retrieval in Cloud Using Watermark Protocol and Searchable Encryption." (2017).
[37] Karthika, M., K. Geetha, and D. Yuvaraj.
"Preserving Location Based Range Query over Outsourced Data with EPLQ Using LOCX."
[38] Santhi, R., and D. Yuvaraj. "Content-Based Image Retrieval in Cloud Using Watermark Protocol and Searchable Encryption." (2017).
[39] Yuvaraj, Duraisamy, and Shanmugasundaram Hariharan. "Efficient Content Based Image Retrieval Technique using Multiple Attributes." (2013).
[40] Kathirvel, R., and K. Batri. "A computer‐aided approach for meningioma brain tumor detection using C ANFIS classifier." International Journal of Imaging Systems and Technology 27.3 (2017): 193-200.
[41] Batri, K. "S. Anbu karuppusamy,“Improving TCP Performance in Ad-Hoc Networks”." European Journal of Scientific Research 65.2 (2011): 237-245.
[42] Gopalan, Nagammapudhur, and Krishnan Batri. "Effect of Filter Size on Fusion Function in Information Retrieval." Int. Arab J. Inf. Technol. 5.2 (2008): 170- 175.
[43] Gopalan, N. P., K. Batri, and B. Siva Selvan. "Adaptive Selection of Top-m Retrieval Schemes for Data Fusion Using Tabu Search." International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007). Vol. 1. IEEE, 2007.
[44] Lakshmi, S., B. Sathiyabhama, and K. Batri. "Entropy a new measure to gauge search engine optimisation."
International Journal of Enterprise Network Management 9.3-4 (2018): 189-204.
[45] Krishnan, Batri. "An effective selection of retrieval schemes for data fusion." Kuwait Journal of Science 44.2 (2017).
[46] Batri, Krishnan. "An Effective Pareto Optimality Based Fusion Technique for Information Retrieval." MIS REVIEW: An International Journal 19.1 (2013): 61-80.
[47] Anbukaruppusamy, S., K. Batri, and Erode Gobi.
"Explicit Congestion Control With Buffer Management For Multihop Adhoc Networks." Life Science Journal 10.2 (2013).