COMPARISON OF CLASSIFICATION TECHNIQUES ON DATA MINING

(1)

International Journal of Emerging Technology and Innovative Engineering Volume 5, Issue 5, May 2019 (ISSN: 2394 – 6598)

267

COMPARISON OF CLASSIFICATION

TECHNIQUES ON DATA MINING

S.Nageswari

Department of Computer Science, Bharath Niketan Engineering College Theni, India; Email: [email protected]

Dr Pallavi M Goel Associate Professor

School of Computing Science and Engineering, Galgotias University Greater Noida, Uttar Pradesh, India

[email protected] P.Divya

Department of Computer Science Engineering, CMS college of Engineering Namakkal, India; Email : [email protected]

Abstract – Educational Data Mining is applying mining techniques to learning-related data. Predicting student performance is the complicated one because of the huge amount of records in learning field. Now a day there is lack of existing survey to get the clear view about predictions.

There are two factors involve in this process such as attributes for prediction and prediction methods. The core aim of this paper is to predict the student’s performance by using the idea of mining methods. In this paper, we compared the accuracy percentage with different data mining methods such as Decision Tree, Neural Network, Naive Bayes, K- Nearest Neighbor, and Support Vector Machine. Among these techniques, Decision Tree and Neural Network provide the best accuracy.

Keywords— Classification Technique, Educational Data Mining, Decision Tree, Neural Network.

I. INTRODUCTION

Student’s performance prediction is an important part of learning field. This process leads to achieve the excellent record in academic. Usamah et al. (2013) stated that performance of students is able to improve by calculating the assessment learning and co-activities [1-9]. The measurement is necessary to calculate the students learning level. The ending grades are used to evaluate the performance of the student’s. End grades are established on the structure of the course, marks, end exam marks and also other activities.

Evaluation of maintaining the performance of student’s and learning process effectiveness is very important. Analysing the performance of student’s by using mining techniques also important. This technique is the most important in analysing student’s performance. And also it has been broadly applied in learning area in recent times [10-24]. It is known as educational mining. It is a process used for mining the information and patterns commencing a large educational

record. Prognosticating the performance is very consequential to amend the quality of learning adeptness of the students.

The next section describes the attributes and methods to be used for this comparison. In section 3, it describes the factors used in mining. In section 4, the detail on the subsisting prognostication methods with its prognostication precision is discussed. Lastly, the conclusion is in section 5.

II. ATTRIBUTESANDMETHODS

For this comparison we searched databases: IEEE Xplore, Springer Link, Science Direct, ACM digital Library. In previous work of predicting student’s performance some attributes used with specific methods are depicted in Table I.

TABLE I. COMMONATTRIBUTESANDMETHODS

USEDTOPREDICTSTUDENTSPERFORMANCE S.

No. ATTRIBUTES METHODS

1 Internal assessments Decision Tree Neural Network K-Nearest Neighbor 2 Internal assessments,

CGPA Support Vector Machine

3

Internal assessments, CGPA, Extra-curricular activities

Decision Tree Naive Bayes K-Nearest Neighbor Support Vector Machine 4

Internal assessments, CGPA, Student Demographic

Decision Tree Naive Bayes K-Nearest Neighbor 5 Internal assessments,

External assessment Neural Network

(2)

6

External assessment, Demographic of Student, Background of High School

Neural Network Decision Tree

7 Psychometric factors Decision Tree K-Nearest Neighbor Support Vector Machine 8 External assessments Decision Tree

Naive Bayes

9 CGPA Decision Tree

Neural Network

10

CGPA, Demographic of Student, Background of High School, Scholarship, Interaction with Social Network

Decision Tree Neural Network Naive Bayes

11

CGPA, Demographic of Student, Background of High School, Scholarship, Interaction with Social Network, Internal Assessment, Extra- curricular activities

Decision Tree

12

Demographic of Student, Background of High School.

Neural Network

13

External assessment, CGPA, Demographic of Student, Extra-curricular activities

Decision Tree

14

Psychometric factors, Extra-curricular activities, soft skills

Decision Tree

15

Demographic of Student, Background of High School, Internal assessment, Extra- curricular activities

Decision Tree

16

Internal assessments, External assessment, Demographic, Extra- curricular activities

Decision Tree Neural Network

III. PREDICTIONOFSTUDENTS PERFORMANCE

Factors for soothsaying students’ performance are attributes and methods. Table 1 gives a detail list of attributes and methods utilized in this predicting the students’ performance.

Primary method is fixated the consequential attributes utilized in this method and next is fixated the indication methods utilized in presaging performance of student’s.

Figure 1 clearly explained about the attributes used for prediction.

3.1 Attributes used in predicting student’s performance

Figure.1 Classification of attributes

There are nearly eight attributes are formed by grouping common classifications.

1. CGPA (Cumulative Grade Point Average) is the mostly used attribute to predict the performance of students.

2. Internal assessments

- Assignment mark, quiz mark, lab mark, class test and attendance

3. Demographic

- Gender, age, background of family, and ill health.

4. External assessments

- Mark obtained in final exam for a particular subject

5. Extra activities

6. Background of High school.

7. Interaction of social network 8. Psychometric factor

- Interest of student, behaviour of study, engage time, and support of family.

3.2 Prediction methods used for student’s performance In educational data mining there are many algorithms for classification techniques had applied to predict the student’s performances. Among the algorithms mostly used are Decision Tree, Neural Networks, Naïve Bayes, K-Nearest Neighbor and Support Vector Machine.

3.2.1 Decision Tree

Most popular method for presage is Decision Tree. Maximum of scholars have utilized this method because of its ease of use and clarity toward denude minute and immensely colossal of data structure and indication of the value [25-33].

This model is understood by all as their perceptive process and this directly converted into the rules (IF-THEN) [1]. In Table 1, there are around twelve (13) papers discussed about usage of Decision Tree and the method to predict the students’ performance. The earlier studies using this method are used to predict the drop out structures of student’s information on their academic performance, to predict the right career for a student based on their developmental outlines and also to predict the semester performance of PG students . The samples of a dataset of final grades [37-39], the Cumulative Grade Point Average (CGPA) and End semester marks . All these datasets were measured and examined to determine the main attributes and factors that may affect their performance. Lastly, the suitable mining procedure will be examined to predict their performance .

Attributes used for

CGPA

Internal assessments

Psychometric factor Demographic

Attributes

External assessments Extra-curricular

activities

High school background

Social interaction network

Marks of assignment, quizzes,

lab work, class test and attendance

Student interest, study behaviour, engage time, and family support Mark obtained in

final exam for a particular subject Gender, age,

family background and disability

(3)

The classification methods are related to predicting their performance based on their study. Gray et al. (2014) examined the accurateness of that model to predict beginner’s progress in tertiary learning. This model concentrated on examining the prediction of the academic performance of the students. accurately forecasting student performance, and compare the precision of mining algorithms .

3.2.2 Neural Networks

This is next widespread technique utilized in scholastic mining. The main advantage of this network is it has the facility to discover all possible relations between soothsayer’s variables . This could moreover excellent in detecting without any suspicion even in the complex nonlinear association between dependent and independent variables. So this is the finest analyst method. There are more papers have been published belong to this method. And it additionally added an Artificial Neural Network model to forecast the student’s performance thoroughly . In this attributes examined by this network are student admission data , student’s behavior in the direction of self-regulated learning and educational performance [14-24]. And also this paper presents how data can be preprocessed and amend the precision of the student’s final grade presage model for a particular course[25-38] . The remaining papers are utilizing decision tree additionally . The outcomes of presage precision are summarized in the above Table 1.

3.2.3 Naive Bayes

This is a choice to make a prediction. There are six (6) papers that have used this algorithm to evaluate student’s performance. The aim of all survey papers is to discover the best actual forecast technique for predicting student’s performance by making relationships with other techniques [5, 6, 7, 4, 19, 20]. The result is shown in Table 1.

3.2.4 K-Nearest Neighbor

Four papers studied showed that K-Nearest Neighbor is used with good accuracy. This method had occupied a smaller amount of time to identify the dissimilar levels of students’

performance all levels of learners[39-47] . This method gives the best precision in approximating the detailed design for learner’s development in tertiary inculcation.

3.2.5 Support Vector Machine

This method used supervised learning method for relegation.

There are some existing papers that have this method to presage performance of the student’s. Hamalainen et al.

(2006) had culled this technique because of it suitable for datasets . Sembiring et al. (2011) vocally expressed that this technique has a respectable generalization of faculty and more expeditious than any other methods. This method has learned the highest prognostication precision in finding students in peril of failure .

TABLE II. PREDICTIONACCURACYOF

CLASSIFICATIONTECHNIQUES

Methods Attributes Prediction Accuracy

Paper Reference

Number

Decision Tree

Internal

assessment mark 76% [1]

Factors of

Psychometric 65% [2]

External

CGPA 91% [4]

CGPA,

Demographic of student,

Background of high school, Scholarship, Interaction of social network

73% [5]

Internal

assessment mark, CGPA, Extra- curricular activities

66% [6]

Demographic of Student,

Background of High school.

65%

[7]

Internal

assessment mark, Demographic of student, Extra- curricular activities

90% [8]

External

assessment mark, CGPA,

Demographic of student, Extra- curricular activities

90% [9]

Factors of Psychometric, Extra-curricular activities, soft skills

88% [10]

CGPA 98% [20]

Internal assessments mark, External assessment mark, Demographic of student, Extra- curricular activities

73% [18]

Internal assessments mark, CGPA, Demographic of student

69%

[19]

Neural Network

Internal

assessments mark 81% [11]

Factors of 69% [2]

(4)

Psychometric External

CGPA 75% [4]

CGPA,

Background of High school, Scholarship, Interaction of Social network

71% [5]

Background of High school

72% [7]

External

assessment mark, Demographic of student,

Background of High school

74% [13]

Internal assessments mark, External assessment mark

98% [14]

Internal assessments mark, External assessment mark, Demographic of student, Extra- curricular activities

74% [18]

Naive Bayes

CGPA,

Background of High school, Scholarship, Interaction of Social network

76% [5]

Demographic of student, Background of High school.

50% [7]

CGPA 75% [4]

Internal

assessment mark, CGPA, Extra- curricular activities

73% [6]

CGPA 94% [20]

Internal assessments, CGPA,

Demographic of student.

71% [19]

K- Nearest Neighbor

Factors of

Psychometric 69% [2]

Internal

assessment mark, 83% [6]

CGPA, Extra- curricular activities Internal

Internal assessments mark, CGPA, Demographic of student.

62% [19]

Support Vector Machine

Factors of

Psychometric 83% [16]

Internal assessments mark, CGPA, Extra-curricular activities

80% [6]

Internal

assessment mark, CGPA

80% [17]

4. DISCUSSIONS

This comparison is predicated on the maximum accuracy of prognostication approaches and withal the consequential aspects that may affect the performance of the student’s. In figure 2 tells that the prognostication precision that utilizes relegation technique grouped by algorithms for prognosticating student’s performance since 2002 to 2016.

PRECISION is the overall correctness of the model and is planned as the sum of precise relegations separated by the whole number of relegations. By seem to be at the diagram in figure 2, Neural Network and Decision Tree has the maximum calculation accurateness by (98%) followed by Naive Bayes by (94%). Lastly, SVM and K-Nearest Neighbour producing the equal accuracy that is (83%). The outcome on calculation accuracy is dependent on the attributes and also prediction method that were used during the prediction process.

Neural Network gave high accuracy (98%) with combination of internal and external assessments. This method got (97%) with external assessment, (81%) with internal assessments and low accuracy (69%) with psychometric factors.

Decision Tree gave highest prediction accuracy (98%) for CGPA and lowest accuracy (65%) for Student Demographic, High school background and Psychometric factors.

Then next is Naive Bayes with prediction accuracy around (94%) as highest for CGPA and lowest (50%) for Student Demographic, High school background.

K-Nearest Neighbor gave high prediction accuracy (83%) for internal assessment mark, CGPA, Extra-curricular activities and low accuracy (62%) for internal assessments, CGPA, Student Demographic.

Support Vector Machine provided high prediction accuracy (83%) for Psychometric factors.

(5)

Figure 2. Prediction accuracy grouped by algorithms CONCLUSION

Prognosticating student’s performance is mainly subsidiary to the educators and learners for amending their cognition and edifying process. This paper has reviewed the earlier studies on soothsaying the student’s performance with sundry data mining methods. Maximum of the scholars have utilized CGPA and assessment mark as data sets for presage. For the presage relegation method is often utilized in scholastic mining area. Among the relegation techniques, Neural Network and Decision Tree are the methods highly utilized by the scholars for presaging performance of the student’s.

This comparison will avail the scholastic system to observe the students performance and can ameliorate.

REFERENCES

[1] Sivaram, M., B. DurgaDevi, and J. Anne Steffi.

"Steganography of two lsb bits." International Journal of Communications and Engineering 1.1 (2012): 2231- 2307.

[2] Sivaram, M., et al. "Exploiting the Local Optima in Genetic Algorithm using Tabu Search." Indian Journal of Science and Technology 12 (2019): 1.

[3] Mohammed, Amin Salih, et al. "DETECTION AND REMOVAL OF BLACK HOLE ATTACK IN MOBILE

AD HOC NETWORKS USING GRP

PROTOCOL." International Journal of Advanced Research in Computer Science 10.6 (2018).

[4] Viswanathan, M., et al. "Security and privacy protection in cloud computing." Journal of Advanced Research in Dynamical and Control Systems (2018):

1704-1710.

[5] Nithya, S., et al. "Intelligent based IoT smart city on traffic control system using raspberry Pi and robust waste management." Journal of Advanced Research in Dynamical and Control Systems, Pages (2018): 765-770.

[6] Dhivakar, B., et al. "Statistical Score Calculation of Information Retrieval Systems using Data Fusion Technique." Computer Science and Engineering 2.5 (2012): 43-5.

[7] Mohammed, Amin Salih, Shahab Wahhab Kareem, and M. Sivaram. "Time series prediction using SRE- NAR and SRE-ADALINE." (2018): 1716-1726.

[8] V. Manikandan, Dr.D. Yuvaraj, Amin Salih Mohammed, V. Porkodi and Dr.M. Sivaram.

“ELECTRICAL ENERGY CONSERVATION AND

ENERGY MANAGEMENT SYSTEM USING

INTERNET OF THINGS.” (2018): 2016-2023.

[9] Porkodi.V, Yuvaraj.D., Mohammed,A.S, Sivaram.M and Manikandan.V. “IoT in Agriculture.” (2018): 1986- 1991.

[10] . Manikandan.V, Mohammed, A.S, Yuvaraj,D., Sivaram.M and Porkod.V. “An Energy Efficient EDM- RAEED Protocol for IoT Based Wireless Sensor Networks.” (2018): 1992-2004.

[11] Dr.Amin Salih Mohammed, Suzan Tahsein Husein, Dr. M.Sivaram, V. Porkodi and V.Manikandan. “4G and 5G Communication Networks Future Analysis” (2019):

343-349.

[12] Abraham, Steﬃn, Tana Luciya Joji, and D. Yuvaraj.

"Enhancing Vehicle Safety with Drowsiness Detection and Collision Avoidance." International Journal of Pure and Applied Mathematics 120.6 (2018): 2295-2310.

[13] Porkodi, V., et al. "Survey on White-Box Attacks and Solutions." Asian Journal of Computer Science and Technology 7.3 (2018): 28-32.

[14] Malathi, N., and M. Sivaram. "An Enhanced Scheme to Pinpoint Malicious Behavior of Nodes In Manet’s." (2015).

[15] Sivaram, M. "Odd and even point crossover based Tabu ga for data fusion in Information retrieval." (2014).

[16] Sivaram, M., et al. "Emergent News Event Detection from Facebook Using Clustering."

[17] Punidha, R. "avithra K, Swathika R, and Sivaram M,“Preserving DDoS Attacks sing Node Blocking Algorithm.” International Journalof Pure and Applied Mathematics, Vol. 119, o. 15, 2018." 633-640.

[18] Batri, K., and M. Sivaram. "Testing the impact of odd and even point crossover of genetic algorithm over the data fusion in information retrieval." European Journal of Scientific Research (2012).

[19] Mohamme, Sivaram Yuvaraj Amin Salih, and V.

Porkodi. "Estimating the Secret Message in the Digital Image." International Journal of Computer Applications 181.36 (2019): 26-28.

[20] Manikandan, V., et al. "PRIVACY PRESERVING DATA MINING USING THRESHOLD BASED FUZZY CMEANS CLUSTERING." ICTACT Journal on Soft Computing 9.1 (2018).

[21] Obulatha-II-ME-CSE, Miss O. "Position Privacy Using LocX."

[22] Sivaram, M., et al. "The Real Problem Through a Selection Making an Algorithm that Minimizes the Computational Complexity."

[23] Sivaram, M., et al. "DETECTION OF ACCURATE FACIAL DETECTION USING HYBRID DEEP

CONVOLUTIONAL RECURRENT NEURAL

NETWORK."

[24] V. Porkodi, Dr.D. Yuvaraj, Dr. Amin Salih Mohammed, V. Manikandan and Dr.M. Sivaram.

“Prolong the Network Lifespan of WirelessSensor Network” (2018): 2034-2038.

[25] M, Sivaram, ENABLING ANONYMOUS

ENDORSEMENT IN CLOUDS WITH

DECENTRALIZED ACCESS CONTROL (March 13, 2019). Available at SSRN: https://ssrn.com/abstract=

[26] M, Sivaram, INTEGER WAVELET TRANSFORM BASED APPROACH FOR HIGH ROBUSTNESS OF

(6)

AUDIO SIGNAL TRANSMISSION (March 13, 2019).

Available at SSRN: https://ssrn.com/abstract=

[27] M, Sivaram, HEALTHCARE VISIBLE LIGHT COMMUNICATION (March 13, 2019). Available at SSRN: https://ssrn.com/abstract=

[28] M, Sivaram, Preserving DDoS Attacks Using Node Blocking Algorithm (March 13, 2019). Available at SSRN: https://ssrn.com/abstract=

[29] M, Sivaram and Sivaram, Porkodi and manikandan, V, Securing the Sensor Networks Along With Secured Routing Protocols for Data Transfer in Wireless Sensor Networks (OCTOBER 28, 2018). Available at SSRN: https://ssrn.com/abstract=

[30] M, Sivaram, Statistical Score Calculation of Information Retrieval Systems using Data Fusion Technique (April 8, 2019). Available at SSRN:

https://ssrn.com/abstract=

[31] Yuvaraj, Duraisamy, and Shanmugasundaram Hariharan. "Content-based image retrieval based on integrating region segmentation and colour histogram."

learning 2.11 (2016): 12.

[32] ShanmugaPriya, S., A. Valarmathi, and D. Yuvaraj.

"The personal authentication service and security enhancement for optimal strong password." Concurrency and Computation: Practice and Experience: e5009.

[33] Sundara Vadivel, P., et al. "An efficient CBIR system based on color histogram, edge, and texture features." Concurrency and Computation: Practice and Experience: e4994.

[34] Ahamed, B. Bazeer, and D. Yuvaraj. "Framework for Faction of Data in Social Network Using Link Based Mining Process." International Conference on Intelligent Computing & Optimization. Springer, Cham, 2018.

[35] Vadivel, P. Sundara, S. Navaneetha Krishnan, and D. Yuvaraj. "AN EFFECTIVE DOCUMENT

CATEGORY PREDICTION SYSTEM USING

SUPPORT VECTOR MACHINES, MANN-WHITNEY TECHNIQUES." International Journal of Pure and Applied Mathematics 118.22 (2018): 895-900.

[36] Santhi, R., and D. Yuvaraj. "Content-Based Image Retrieval in Cloud Using Watermark Protocol and Searchable Encryption." (2017).

[37] Karthika, M., K. Geetha, and D. Yuvaraj.

"Preserving Location Based Range Query over Outsourced Data with EPLQ Using LOCX."

[38] Santhi, R., and D. Yuvaraj. "Content-Based Image Retrieval in Cloud Using Watermark Protocol and Searchable Encryption." (2017).

[39] Yuvaraj, Duraisamy, and Shanmugasundaram Hariharan. "Efficient Content Based Image Retrieval Technique using Multiple Attributes." (2013).

[40] Kathirvel, R., and K. Batri. "A computer‐aided approach for meningioma brain tumor detection using C ANFIS classifier." International Journal of Imaging Systems and Technology 27.3 (2017): 193-200.

[41] Batri, K. "S. Anbu karuppusamy,“Improving TCP Performance in Ad-Hoc Networks”." European Journal of Scientific Research 65.2 (2011): 237-245.

[42] Gopalan, Nagammapudhur, and Krishnan Batri. "Effect of Filter Size on Fusion Function in Information Retrieval." Int. Arab J. Inf. Technol. 5.2 (2008): 170- 175.

[43] Gopalan, N. P., K. Batri, and B. Siva Selvan. "Adaptive Selection of Top-m Retrieval Schemes for Data Fusion Using Tabu Search." International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007). Vol. 1. IEEE, 2007.

[44] Lakshmi, S., B. Sathiyabhama, and K. Batri. "Entropy a new measure to gauge search engine optimisation."

International Journal of Enterprise Network Management 9.3-4 (2018): 189-204.

[45] Krishnan, Batri. "An effective selection of retrieval schemes for data fusion." Kuwait Journal of Science 44.2 (2017).

[46] Batri, Krishnan. "An Effective Pareto Optimality Based Fusion Technique for Information Retrieval." MIS REVIEW: An International Journal 19.1 (2013): 61-80.

[47] Anbukaruppusamy, S., K. Batri, and Erode Gobi.

"Explicit Congestion Control With Buffer Management For Multihop Adhoc Networks." Life Science Journal 10.2 (2013).