EPSP:Early Prediction of Student Performance Using Classification Method of Data Mining Mr.Bakale Aakash Datta

(1)

Resincap Journal of Science and Engineering

Volume 4, Issue 4, April 2020 ISSN: 2456-9976

950

EPSP:Early Prediction of Student Performance Using Classification Method of Data Mining

Mr.Bakale Aakash Datta P.G. Student

Computer Engineering Dept.

S.N.D.C.O.E. Rc. Yeola [email protected]

Prof. Vikas. N Dhakane Asst. Professor Computer Engineering Dept.

S.N.D.C.O.E.Rc. Yeola [email protected]

ABSTRACT

Higher education institutions are often very curious to know about the success rate of the students throughout their study.

For this reason, they need to use several methods like physical examination, Statistical methods and currently prevailing data mining techniques for the prediction of student’s performance.

An upcoming area of research which uses techniques of data mining is known as Educational Data Mining. It involves machine learning algorithms and statistical techniques to help the user for interpretation of student’s learning habits, their academic performance and further improvement if required.

In this paper we will discuss various techniques of data mining which are useful for predicting performance level of students. For this we used dataset of kalboard 360and applied it on weka to analyze the data mining techniques.

Keywords

Data Mining, Error Measurement, Accuracy, Naïve Bayes, J48, Multilayer Perceptron(Key Words).

1. INTRODUCTION

In the present scenario, data mining/Machine Learning is a very important field of research and playing an indispensable responsibility in educational institutions and one of the most important areas of exploration with the aim to find out relevant facts taken from historical data stored in huge dataset.

Data mining for education i.e. Educational Data Mining (EDM) is the discipline which uses data mining techniques in the environment of education. It is a very important research area which helps to predict useful information from educational databases to improve educational achievement and to have better assessment of the students learning process.

Educational Data Mining could be considered as a best option of the science of learning and as a branch of data mining [1][2][3]. Educational Data Mining can be useful while creating a model of user perception, action and trial [4]. Data Mining or knowledge discovery has gain the popularity in such a way that it has become the emerging relevance because it is very helpful in examining data form divergent approach and abridge it into functional information [5]. Educational data mining relies on many data mining techniques like k- nearest neighbor, neural networks, decision trees, support vector machines, naive bayes, and many more [6]. For doing quick analysis on data with the help of data mining techniques, there are many open source software’s like weka, rapid miner, orange, knime, SSDt (SQL Server data Tools) designed for data investigation and to get understandable structure for future use. In this paper, we use WEKA (Waikato Environment for Knowledge Analysis) which is best suited for the analysis of data and to built a model to get predictive outcome.

2. REVIEW OF LITERATURE

Most of the researcher have done their study in data mining using for educational purposes to get the prophecy of the students' achievement. In [8] the performance of engineering students can be judged with the help of Decision Tree (DT) algorithm. Around 340students data was collected for the prophecy of their achievement in the first year exams. The build model was able to generate only 60% accuracy in the training set. In [9] WEKA was used for the prognosis of marks of final year students and these were based on two different dataset’s parameters. There was one common information in each dataset i.e. variety of students could be taken from one college course in last four semesters. In [10]

the author analyzed with his own reviews of past research work done on performance prediction of students’ its analysis and assessment by applying dissimilar techniques of data mining. In [11] the authors measuring student performance using Decision Tree classification techniques and used artificial neural network to build classifier models. The produced outcome was based on various traits to foresee the outcome of the students. Analyzing the weakness and strength of student which may be helpful to improve the performance in future. This study shows the efficacy of applying the methods/procedures of data mining in course rating data and the data could be mined for education at higher level. In [12]

the authors represent a study that will be beneficial to the students and the teachers for the betterment to uplift the result of the students who are having more chances of nonsuccess.

There are many parameters like Attendance, Seminar and assignment marks were collected from very important resource i.e. previous database of students, to evaluate their prophecy at the semester end. The authors used Naïve Bayes classification algorithm that shows a highest accuracy compared to other classification algorithms. The researchers in [13] worked on a relative research to examine various decision tree algorithms and their influence on the data set choose for education to stereotype the education related prophecy of stake holders i.e. students. It mainly cynosure on choosing the top prioritized algorithm of decision tree and explain the detailed meaning of each one of them and the result shown that the regression as well as classification methods are best because they are more compatible to produce better result with the dataset that is already tested.

Researchers in [14] have concluded with an idea for the better use of data mining techniques in the prediction of student’s prophecy and also it provided the strong interpretation that algorithms for prediction of data mining, Decision Tree and Neural network are the two prime methods which are highly advisable by the researchers for the prediction of student’s prophecy. Authors in [15] applied Data Mining techniques to find and evaluate future results and factors which affect them.

(2)

Resincap Journal of Science and Engineering

Volume 4, Issue 4, April 2020 ISSN: 2456-9976

951

Author in [16] discussed k-Nearest Neighbor (k-NN) algorithm which plays an effective role in the accuracy of the classifier..

3. PROPOSED SYSTEM

For decision making procedure, Data mining is a very favorable and constructive method. Classification is a very simple and mostly used data mining technique. Knowledge of training data is mandatory for understanding of Classification.

There are two phases of classification procedure:

 Development of a model for training

 Evaluating the model using testing data

On the basis of algorithms, different methods of classification are:

 Statistical based algorithms: Statistical procedures are normally having an accurate fundamental probability model which provides chances of being in each class rather than just a simple classification.

 Correlation Analysis: It is a statistical method used to find the degree of association between two numerically measured, continuous variables (e.g. age and weight) is related to each other.

 Regression Analysis: This method describes that how an independent variable is numerically associated with dependent variable

 Bayesian Model: This method uses frequents technique. The essence of frequents technique is to apply probability to data. Bayesian calculations go straight for the probability of the hypothesis.

 Distance based algorithms: Each item plot to a particular class can be observe as same as other items are already present in that class and could be differentiated from the items of other classes. There are two approaches for classification on the basis of distance i.e

 Simple Approach: In this method, an assumption is that each class is represented by its center. A new item can become a member of a class with the possibility of largest similarity value.

 K nearest neighbors: It is a non-parametric method which depends on the use of distance measurement All available cases can be stored in it and whenever a new case entered, it can be classify based on the distance function.

 Decision tree based algorithms: According to this method, there is a requirement of construction of a tree to model classification process. Two steps are required in this method of classification:

a. Build a tree named with Decision Tree b. Implementation of Decision tree to database Neural Network based algorithms: In this method, a model is created which provides a format for data representation.

At the time of tuple classification, all attributes related to that tuple are redirected into a graph.

 Rule based algorithms: In this method, classification may be done on the basis of if then else rules for data classification.

Fig No 1 Different Classification Methods

4. DATA SET DESCRIPTION

In this paper, we are using kalboard 360 dataset which lies in the domain of education and gathered using learning management system (LMS). This type of system always facilitates users with a contemporary use for the resources related to education with the help of an instrument and Internet connection. Collection of data is done through the tool which is called learner activity tracker tool, called experience API (xAPI), a major part of the training and learning architecture (TLA) which authorize to check progress of learning and actions of learner’s which may be an article’s reading or watching a training video. The experience API helps the learning activity providers to determine the learner, activity and objects that describe a learning experience. There are 16 features and 480 student records in this dataset. There are three main categories of features: Demographic features such as gender and nationality Educational features such as educational stage, grade Level and section. Psychological features such as raised hand on class, opening resources, answering survey by parents, and school satisfaction.

5.IMPLEMENTATION OF CLASSIFIERS IN WEKA

5.1 Using J48 Algorithm

It is an extended version of ID3. Some additional features like accounting for missing values, decision trees pruning, derivation of rules etc. are added in J48. It is an open source Java implementation of C4.5 algorithm.

5.2 Using Support Vector Machine Algorithm

This approach is of machine learning approach which is used for classification and regression analysis. But most of the time, it is used for classification challenges. Large amount of data can be analyzed to find hidden patterns from them.

(3)

Resincap Journal of Science and Engineering

Volume 4, Issue 4, April 2020 ISSN: 2456-9976

952

Fig.No 2 J48 Algorithm

Fig.No 3 Vector machine algorithm

5.3 Using Naïve Bayes Algorithm

It is a well built algorithm for the classification task. We can achieve great results from this algorithm when we use the same for text based data analysis like Natutal language Processing (NLP). There is an assumption that a particular feature and its value is independent of any other feature and its value.

Fig.No4 Naïve Bayes Algorithm

5.4 Using Random Forest Algorithm

It is flexible, easy to use a supervised algorithm of classification. As per its name, this algorithm creates forest with a number of trees. More trees in the forest means more robust the forest which indicates the high accuracy results. In simple word, we can say that there are multiple decision trees built by this algorithm which can be merged together to get more stable and accurate prediction for result.

Fig.No 5 Random Forest Algorithm

5.5 Using Multilayer Perceptron

It is a class of fee forward artificial neural network. It consist of three layers of nodes i.e. input layer, hidden layer ad an output layer. It generates a set of outputs from a set of inputs.

A Multilayer perceptron consists of several layers of input nodes which are connected to each other as a directed graph between input and output layers. It is a deep learning technique which can be used in speech recognition, image recognition and machine translation.

Fig.No 6 Multilayer Perceptron

(4)

Resincap Journal of Science and Engineering

Volume 4, Issue 4, April 2020 ISSN: 2456-9976

953 6. RESULT ANALYSIS

The experimental results and discussion have done on electin 163 instance. Five selected classification algorithms were used; Random Forest, Naive Bayse, Multilayer Perceptron, Support Vector machine and J48 each one has its own characteristics to classify the data set. Table No 1 shows performance results of all classifiers by using WEKA, and Figure 7 shows the accuracy performance of classification techniques.

Table No 1. Performance Result

Fig.No 7Classifiers Accuracy Performance

In Table No 1, the Multilayer Perceptron classifier has more correctly classified instances than other classifiers, which is usually referred to the best accuracy model. The graphical representation in Figure 4 shows that the best classifier of students' performance based on their dataset is the Multilayer Perceptron classifiers. In the result, Multilayer Perceptron has an efficient classification among other classifiers. Table No 1 shows the performance accuracy of the five classifiers based on different classification metrics. These metrics are; (TP), (FP), Precision, Recall and F-measure measure are very important to determine the classifiers based on the accuracy.

These metrics shows that Multilayer Perceptron classifier performs better than other classifiers.

Fig.No 8 Classifiers Performance Metrics

CONCLUSION

Data mining has a significant importance in educational institutions. The knowledge acquired by the usage of data mining techniques can be used to make successful and effective decisions that will improve and progress the student's performance in education. Data set contains of 163 instance and sixteen attributes. Five classifiers are used under weka and the comparisons are made based on the accuracy among these classifiers and different error measures are used to determine the best classifier. Experiments results show that Multilayer Perceptron has the best performance among other classifiers. In future work, more dataset instance will be collected and will be compared and analyzed with other data mining techniques such as association and clustering

REFERENCES

[1] M. Goyal and R. Vohra, “Applications of Data Mining in Higher Education”, IJCSI International Journal of Computer Science Isses, Vol. 9, Issue 2, No 1, March 2012.

[2] R. Huebner, “A survey of educational data mining research”, Researchin Higher Education Journal, 2012.

[3] M.S. Mythili, A.R. Mohamed Shanavas, “An Analysis of students’ performance using classification algorithms”, IOSR, Journal of Computer Engineering, Volume 16, Issue 1, January 2014.

[4] S. Lakshmi Prabha, A.R.MohamedShanavas,

“Educational data mining applications”, Operations Research and Applications: An International Journal (ORAJ), Vol. 1, No. 1, August 2014.

[5] C. Romero, S. Ventura and E. Garcia, "Data mining in course management systems: Moodle case study and tutorial", Computers & Education, Vol. 51, no. 1, pp.

368-384, 2008

[6] S. Ayesha, T. Mustafa, A. Sattar and M. Khan, “Data mining model for higher education system”, Europen Journal of Scientific Research,Vol.43, no.1, pp.24-29., 2010

Classifiers

Criteria Random Forest

Naïve Bayes

Multilayer Perceptron

Support Vector

Machine DT - J48 Accuracy % 67.40% 64.40% 76.07% 75.40% 73.6

0%

Correctly Classified Instances

110 105 124 123 120

Incorrectly Classified Instances

53 58 39 40 43

(5)

Resincap Journal of Science and Engineering

Volume 4, Issue 4, April 2020 ISSN: 2456-9976

954

[7] Weka:Data Mining Software in Java, University of Waikato,[Online].Available:http://www.cs.waikato.ac.n z/ml/index.html.

[8] Z. J. Kovacic, “Early prediction of student success:

Mining student enrollment data”, Proceedings of Informing Science & IT Education Conference (In SITE) 2010.

[9] I. Milos, S. Petar, V. Mladen and A. Wejdan, Students’

success prediction using Weka tool, INFOTEH- JAHORINA Vol. 15, March 2016.

[10] P. Kavipriya, A Review on Predicting Students’

Academic Performance Earlier, Using Data Mining Techniques, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 6, Issue 12, December 2016 ISSN: 2277 128X.

[11] N. Ankita, R. Anjali, Analysis of Student Performance Using Data Mining Technique, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 5, Issue 1, January 2017.

[12] P. Shruthi, B. Chaitra, Student Performance Prediction in Education Sector Using Data Mining, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 6, Issue 3, March 2016.

[13] S.K Yadav, B. Bharadwaj, and S. Pal. Data Mining Applications: A Comparative Study for Predicting Student’s Performance. International Journal of Innovative Technology & Creative Engineering (ISSN:

2045- 711), Vol. 1, No.12, December 2012

[14] A.MohamedShahiria,, W. Husaina , N. Abdul Rashida,

"A Review on Predicting Student’s Performance using Data Mining Techniques" Procedia Computer Science 72 ,414 – 422, ELSEVIER 2015.

[15] K. Kohli and S. Birla, " Data Mining on Student Database to Improve Future Performance", International Journal of Computer Applications, Vol.146 No.15, pp.

0975 – 8887, July 2016.

[16] Rashmi Agrawal, “Integrated Effect of k Nearest Neighbors and Distance Measures in k-NN Algorithms”, International Journal of Advances in Intelligent Systems and Soft Computing, vol. 654, pp.759- 765 , Springer, 2017

[17] Rashmi Agrawal, Neha Gupta “Educational Data Mining Review: Teaching Enhancement”, Privacy and Security Policies in Big Data, pp.149-165, IGI Global, 2017.