Predicting Students’ Academic Drop Out and Failures Using Data Mining Techniques

(1)

Predicting Students’ Academic Drop Out and Failures Using Data Mining Techniques

R. Venkatesan¹, V. Manikandan², D. Yuvaraj³, A. Mohamed Uvaze Ahamed⁴

1Department of Computer Science and Engineering, M.I.E.T Engineering College, Trichy, India,

2Department of Computer Networking, Lebanese French University, Erbil, Kurdistan Region- Iraq

3Department of Computer Science, Cihan University - Duhok, Kurdistan Region- Iraq

4Department of Computer Science, Cihan University - Erbil, Kurdistan Region- Iraq

Abstract

The problem of student dropout has steadily increased in many Schools in India. The main purpose of this research is to develop a model for predicting dropout occurrences with the students and determine the factors behind these cases. Students’ academic exhibition is unsafe for instructive foundations in light of the fact that strategic projects can be prearranged in creating or keeping up the order of the understudies for the span of their time of concentrates in the organizations. In this paper, we consider issues of elements influencing understudies' dropout rate, examined various systems of information mining, AI which will foresee the understudy execution record and what the parameters are which influences the precision of the expectation model.

Key words: Classification, Data Mining, Feature Selection, Personal Profile, Student Dropout.

1. Introduction

Many educational organizations and school organizations today, investigate every possibility to improve their understudy's scholastic execution. In which the engravings procured by the understudy in the evaluation pick his/her future. They have to grow the amount of understudy's getting passed in the yearly scholastics. The clarification behind this is to develop the best idea of the guidance methodology in their establishment, to keep up the brand name of the affiliation and to show understudies in an unrivaled way. So as to expand the number of understudies getting passed, the understudies that may get flopped in that specific year in scholastics need to discover right off the bat. In this paper, we will talk about data mining and AI procedures which would give factual examination over the understudy dataset which prompts figure the scholarly presentation of the student.

1.1 Data Mining

Data mining task can be performed on the pre processed data in order to extract the fascinating pattern. The pattern can be interpreted to obtain the knowledge. The data mining task can be classified into two types: one is engaging and another one is predictive. The descriptive-based mining can derive a pattern from the data can summarize the relation among the data. Clustering, association rule discovery, successive pattern discovery are examples of descriptive-based mining tasks. On the other hand, the predictive-based mining develops a pattern from the data that is used to predict the class-label from the given set of feature values of an instance. Classification and regression are examples of predictive- based mining tasks (Mitra & Acharya 2005) (Han et al. 2011).

(2)

Figure 1. Architecture of the Enhanced Fuzzy Resolution Mechanism using ANFIS 1.2 Feature Selection

Feature selection is a procedure of evacuating the unimportant and excess highlights from a dataset so as to improve the execution of the supervised learning algorithms as far as arrangement exactness and time to build the model. Here are two major approaches to feature selection. The first is the Individual Evaluation, and the second is a Subset assessment. The positioning of the highlights is the way as an individual assessment. In Individual Evaluation, the weight of an individual component is allocated by its level of pertinence. In Subset Evaluation, up-and-comer highlight subsets are developed utilizing the hunt technique. The general system for highlight determination has four key strides as appeared

 Subset Generation

 Evaluation of Subset

 Stopping Criteria

 Result Validation

The wrapper strategy utilizes the managed student to assess the highlights in the element choice procedure and it likewise requires a scanning technique for highlight subset generation. Subsequently, it is computationally costly and the subset age initiates high space multifaceted nature. The wrapper approach produces higher order precision just for the specific managed learning calculation utilized in the element assessment process that prompts poor all inclusive statement.

1.3 Data Pre Processing

The real world data tend to be conflicting, inadequate and boisterous essentially. Such low quality information can prompt wrong outcomes in information mining or learning revelation process. In this way, the information procured from this present reality must be preprocessed before given to the information mining process so as to extricate the expected examples to get the learning. The information preprocessing incorporates information cleaning, information joining, information transformation and information decrease.

1.4 Data Pre Processing

Simulated intelligence, a piece of man-made cognizance, is a coherent control stressed over the arrangement and headway of computations that empower PCs to propel rehearses reliant on careful

(3)

data, for instance, from sensor data or databases. An understudy can adventure models (data) to get characteristics of the excitement of their dark fundamental probability transport. Data can be seen as models that show relations between watched factors. A critical point of convergence of AI research is to normally make sense of how to see complex models and choose shrewd decisions reliant on data; the issue lies in the manner that the course of action of each and every possible lead given each and every comprehensible data is too tremendous to even think about being in any capacity verified by the game plan of watched models (getting ready data). Consequently, the understudy must total up from the offered models, to have the choice to convey a significant yield in new cases. Speculation is the capacity of an AI calculation to perform precisely on new, inconspicuous models subsequent to preparing on a limited informational index. The center target of a student is to sum up from its experience. The preparation models from its experience originate from some commonly obscure likelihood dispersion and the student needs to remove from them something progressively broad, something about that dissemination that enables it to deliver valuable answers in new cases.

Machine learning, knowledge disclosure in databases (KDD) and information mining these three terms are normally utilized, as they frequently utilize similar techniques and cover firmly. They can be generally isolated as pursues:

• Machine learning centers around the forecast, in light of realized properties gained from the preparation information.

• Data mining (which is the investigation venture of Knowledge Discovery in Databases) centers around the revelation of (already) obscure properties on the data.

In any case, these two zones cover from numerous points of view data mining uses numerous AI techniques, anyway much of the time in light of a possibly remarkable target. On the other hand, AI also uses data mining systems as "solo learning" or as a preprocessing dare to improve understudy accuracy. A huge piece of the disorder between these two research systems begins from the crucial suppositions they work with: in AI, the show is commonly surveyed concerning the ability to copy known learning, while in KDD the key endeavor is the exposure of previously cloud data. Evaluated with respect to known learning, a confused (independent) system will easily be defeated by oversaw procedures, while in a common KDD task, managed methods can't be used as a result of the detachment of getting ready data.

Some AI frameworks endeavor to dispose of the requirement for human instinct in information examination, while others embrace a shared methodology among human and machine. Human instinct can't, in any case, be altogether disposed of, since the framework's planner must indicate how the information is to be spoken to and what systems will be utilized to look for a portrayal of the information. AI calculations can be sorted out into a scientific classification dependent on the ideal result of the calculation.

Supervised learning produces a capacity that maps contributions to wanted yields (likewise called marks, since they are regularly given by human specialists naming the preparation models). Order issue, the student approximates a capacity mapping a vector into classes by seeing information yield instances of the capacity. Unsupervised learning models a lot of sources of info, such as bunching. Semi-directed learning joins both named and unlabeled guides to produce a suitable capacity or classifier. Support learning figures out the proper behavior given a perception of the world. Each activity has some effect in the earth, and the earth gives input as remunerations that aides the learning calculation. Transduction attempts to foresee new yields dependent on preparing inputs, preparing yields, and test inputs.

2. Classification Algorithms

The classification algorithm is a supervised learning algorithm that learns the dataset D and builds up an idea portrayal known as order model or classifier. The grouping model is utilized to foresee or arrange the name of the unlabeled occasion. This characterization model is created and spoke to dependent on the kind of calculation utilized. The choice tree-based J48 characterization calculation builds up the model with the portrayal of choice tree, and the probabilistic-based credulous bayes order calculation builds up the arrangement model with the portrayal of probabilistic synopses. In this

(4)

postulation, four order calculations, in particular probabilistic-based gullible bayes, choice tree-based J48, occasion based IB1, KNN and so on are utilized to assess the exhibition of the component choice techniques.

2.1 Naive Bayes Classification (NB)

The Naive Bayes classification (NB) is a probabilistic-based classification method that uses the density-based estimation on the dataset. This algorithm adopts the Bayes theorem with the assumption that the feature values are conditionally independent of one another with respect to the given target- class. With this assumption, naive bayes classifier produces the posterior distribution to estimate the class-label with a decision boundary. The naive bayes classifier predicts the most probable class-label for the unlabeled instance that means it maximizes the posterior decision rule (Manning et al. 2008) (Hastie et al. 2009). This algorithm works with the following steps:

Step 1: Approximate the densities of the feature values with each class-label.

Step 2: Estimate the posterior probabilities (Equation) based on the Bayes

Where C is the target-class attribute with respect to the class label of an instance.x1, x2, ...,xN are the feature values of an instance. π(C=ci) is the prior probability with the class-label ci. 4.13 classifiers an unlabeled instance by estimating the posterior probability for each class-label and then predicts the class-label for the unlabeled instance with the maximum posterior probability.

2.2 J48 Classification

The Java implementation of C4.5 classification (J48) is a decision tree-based classification method that develops a decision tree structure with logical rules by learning a given dataset D in order to predict a class-label of an unlabeled instance. It is a popular classification algorithm because of its robustness, speed, and easy to understand and interpret the structure of the decision tree.

The nodes of the tree represent the feature and the branches are associated with the feature values.

The leaves of the tree are associated with the class-labels to predict the unlabeled instance. In order to build a decision tree from the training data, information theory is adopted to split the features and to construct nodes. Further, information rule for all ci where i = 1, 2... L. gain (IG) criterion is used to identify the root node, split-up the dataset and to form the branch node (Quinlan 2014).

2.3 K-Nearest Neighbour Classification (KNN)

The K-Nearest Neighbour (KNN) classification is an instance-based classification method. The distance similarity measure is used to predict the class-label for the unlabeled instances. It assumes that the nearest instances belong to the same class-label. This classifier is also called as a lazy learner since it does not take any effort until the time to start in identifying the class-label for unlabeled instances. It is a basic learning strategy since it receives basic, non-parametric choice principles. The term non- parametric implies that there is no earlier information about the measurable dispersion of the information that is required for grouping. The following steps are carried out to predict the class-label of an unlabeled instance using K-Nearest Neighbour (KNN) classification algorithm.

Let Iq be the unlabeled instance whose class-label is to be predicted. Then, Q number of instances I1, I2… IQ that are nearest to Iq are identified from the dataset. The class-label of Iq is predicted as the class-label to which most of the instances out of Q number of instances belong to. The extended version

(5)

of the KNN algorithm is known as Instance-Based Learner (IB1). The instance-based learning method is computationally expensive because it needs to save all the training instances. It cannot perform better with noisy and irrelevant features (Aha et al. 1991) (Garcia et al. 2012).

2.4 Metrics for Evaluating the Performance of Instances

The performance evaluation metrics assess the classification model or the classifier based on the ability to accurately predict the class-labels of the unlabeled instances. Predicting the class-labels of the unlabeled instances in a two-class problem can be categorized into four types namely true positive, true negative, false positive and false negative. The terms positive and negative represent the two class- labels of the dataset. The term positive instance represents that the instances belong to the positive class-label. The term negative instance represents that the instances belongs to the negative class-label.

These definitions can also be extended to multiclass problems.

2.5 Classifier Validation Techniques

The training dataset is used to build the classifier and it is validated by the test dataset. The validation techniques are categorized based on how the test dataset is supplied to validate the classifier.

2.6 Use Training Set

In this method, the whole training dataset is supplied as the test dataset to evaluate the performance of the classifier.

2.7 Supplied Test Set

In this method, a separately prepared test dataset is supplied for evaluating the performance of the classifier.

3. Proposed Work

This proposal points in anticipating the understudy's scholastic disappointment utilizing information mining strategy. The strategy proposed in this paper for foreseeing the scholastic disappointment of understudies has a place with the procedure of Knowledge Discovery and Data Mining. There are four primary strategies for the undertaking. They are as per the following:

1. Data Collection 2. Data Management 3. Data Mining 4. Implementation

3.1 Data Set

Six standard datasets drawn from the UCI collection were used in the experiments. These datasets were chosen because of the prevalence of nominal features and their predominance in the literature.

(6)

Table 1. Comparing Id3 and C4.5

Characteristic ID3 Algorithm C4.5 Algorithm

Splitting Criteria Information Gain Gain ratio

Attribute type Handle only

nominal values

Handle both nominal and numeric values Missing values Do not handle

missing values

Handle missing Values Pruning Strategy No pruning is done Error based pruning is

done

Outlier Detection Susceptible on outliers Susceptible on outliers

(7)

Father

Mother

Family

Father

Mother

Student

Gender

Grad e

Categor y

qualificatio

n

qualificatio

n

income

occupatio

n

occupatio

n Dropout

Gualteri

o Male A MBC BE PG High Engineer Software FALSE

Swen Male B SC

Elementar

y Agriculture Poor

Agricultur

e Coolie FALSE

Marchall Male O OBC PG MBBS High Business Doctor FALSE

Maria Female A ST UG PH.D. Medium Employee Professor FALSE

Terence Male C ST

Elementar

y Elementary Low Farmer Coolie TRUE

Bambi Female D SE UG Elementary Medium Business

Agricultu

re TRUE

Aland Male C SC Secondary Elementary Low Farmer

Agricultu

re TRUE

Carson Male O OBC MBBS Ug Medium Doctor

Consulta

nt FALSE

Ilyssa Female C OBC Secondary Secondary Low

Agricultur

e

Agricultu

re TRUE

Laureen Female D MBC PG UG High Police Teacher FALSE

Felic

Male

A

SE

PH.D.

UG

High

Business

Retired FALSE

Theodor Male A SC UG PG Medium Driver Teacher FALSE

Corabell

a

Female

E

ST

Secondary

No-

education

Low

Agricultur

e

Coolie TRUE

Arthur Male O ST MBBS UG High Doctor Nurse FALSE

Haslett Male D MBC UG Elementary Low

Electricia

n

Agricultu

re TRUE

Neville Male C FC PG UG High Software Engineer FALSE

Zebadia

h Male D SC PH.D. PG Medium Professor Teacher FALSE

Analise

Female

D

SC

Elementar

y

No-

education

Low

Coolie

Coolie TRUE

Kerry Male O SC PG Secondary Average Business Business FALSE

Parry Male A BC

Elementar

y Secondary Low Driver Maid TRUE

Bindhu Female O MBC Secondary UG Average

Electricia

n Tailor FALSE

(8)

3. 2 Data Preprocessing

The essential accessible information, for example, registration, financial information, and couple of fundamental data are gathered from National Informatics Center (NIC), which is for the most part required to structure and create. The information is framed by the required organization and structures. Further, the information is changed over to ARFF (Attribute Relation File Format) arrangement to process in WEKA. An ARFF document is an ASCII content record that depicts a rundown of occasions sharing a lot of qualities. ARFF records were created by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the Weka AI programming. The data can be removed regarding at least two cooperative connection of informational collection. Further our exploration will be stretched out on new characteristics like, male_percent_literacy, female_percent_literacy and sex_ratio are contrasted each other with concentrate the effect of proficiency on sexual orientation disparity..

The classification algorithm, in particular, ID3, C4.5, and NB are connected on the first dataset with unique highlights just as the datasets and the general execution is recorded.

ALGORITHM CORRECTLY INCORRECTLY

CLASSIFIED CLASSIFIED

INSTANCES INSTANCES

ID3 90.9091% 3.64%

C4.5 89.0909% 10.9091%

3.2.1 NAIVES BAYES algorithm – Using Data Set 1

SCHEME:WEKA.CLASSIFIERS.BAYES.NAIVE B AYES (USE TRAINING DATA SET) Relation: result data Instances: 49 Attributes:27

Time taken to build model: 0.1 seconds Evaluation on training set

Time taken to test model on training data: 0 seconds

(9)

SUMMARY

Correctly Classified Instances 49 100 % Incorrectly Classified Instances 0 0 %

Kappa statistic 1

Mean absolute error 0.0405 Root mean squared error 0.088 Relative absolute error 89.105 % Root relative squared error 184.914 % Total Number of Instances 49 DETAILED ACCURACY BY CLASS

TP Rate FP Rate

Precision Recall F-Measure MCC ROC Area PRC Area Class 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 FALSE 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 TRUE 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 (Weighted Avg. )

CONFUSION MATRIX

a b <-- classified as 32 0 | a = FALSE 0 17 | b = TRUE

3.2.2 Naives Bayes algorithm - Using Data set1 Stratified Cross - Alidation

TEST MODE: 10 FOLD CROSS VALIDATION: NIAVES BAYES

Time taken to build model: 0 seconds SUMMARY

Correctly Classified Instances 31

63.2653%

Incorrectly Classified Instances 18

36.7347%

Kappa statistic 0.1893

Mean absolute error 0.3954

Root mean squared error 0.5326 Relative absolute error 86.5949 % Root

relative

squared error 111.478 %

Total Number of Instances 49

CONFUSIONMATRIX

a b <-- classified As23a

9 8 | b = TRUE

= FALSE

3.2.3 10 Percentage Split 70 and 30: Naive Bayes

Test Mode: 10 Percentage Split 70 and 30: Niave bayes Time taken to build model: 0 seconds

(10)

Evaluation on test split

Time taken to test model on test split: 0 seconds SUMMARY

Correctly Classified Instances 10 66.6667%

Incorrectl

y Classified Instances 5

33.3333%

Kappa statistic 0.3119

Mean absolute error 0.3759

Root mean squared error 0.4608

Relative absolute

error 83.874%

Root relative squared error 100.4276 %

Total Number of Instances 15

4. Discussions

Discussion as a rule, with respect to the DM, approaches utilized and the arrangement results acquired, the principle ends are as per the following. We have demonstrated that grouping calculations can be utilized to effectively foresee an understudy's scholarly exhibition. We have demonstrated the utility of highlight determination procedures when we have an incredible number of qualities. For our situation, at first 10 qualities were utilized and we have connected the calculations. Next, we utilize 26 best properties, acquiring fewer principles and conditions without losing characterization execution.

Despite everything we have to utilize uncommonly ascribes to close in what direction the dropout happens, what we have to make the understudies complete their course. We have indicated two unique approaches to address the issue of imbalanced information arrangement by rebalancing the information and considering diverse grouping costs. Truth be told, rebalancing of the information has had the option to improve the characterization results got.

(11)

5. Conclusion

Student’s dropout prediction is a vital and challenging task. We have demonstrated with various classification algorithms with the learning approach of the data sets. The results show that rather simple classifiers give us a useful result with accuracies between 75 and 85% that is hard to beat with other sophiscated models .A key improvement in this data set would be to find the solution is the major improvements that can be assessed. The shrouded examples, affiliations and peculiarities that are found by the information mining procedures from instructive information can improve choices making process in the advanced education framework. We have demonstrated that a few methodologies, for example, choosing the best traits, cost-touchy grouping, and information adjusting can likewise be exceptionally helpful for improving precision. At long last, as the subsequent stage in our exploration, we intend to complete more examinations utilizing more information and furthermore from various instructive levels (essential, optional, and higher) to test whether a similar presentation results are acquired with various dm methodologies have demonstrated that a few methodologies, for example, choosing the best properties, cost-delicate order, and information adjusting can likewise be exceptionally helpful for improving precision. At last, as the subsequent stage in our examination, we mean to complete more investigations utilizing more information and furthermore from various instructive levels to test whether similar exhibition results are acquired with various dm draws near.

References

1. Abeer Badr El Din Ahmed & Ibrahim Sayed Elaraby(2014). Data Mining: A prediction for Student's Performance Using Classification Method, World Journal of Computer Application and Technology, Vol. 2, no.2, pp.43-47, 2014.

2. Carlos Márquez-Vera & Cristóbal Romero Morales, Sebastián Ventura Soto.(2013) Predicting School Failure and Dropout by Using Data Mining Techniques, IEEE journal of Latin- American learning technologies, Vol. 8, no. 1, pp. 7-14, 2013.

3. Saurabh Pal. (2012) Mining Educational Data to Reduce Dropout Rates of Engineering Students, IJIEEB, Vol. 2, no. 2, pp. 1-7, 2012.

4. Mohammed M , AbuTair, Alaa M.& El-Halees(2012) Mining Educational Data to Improve Students’ Performance: A Case Study, JICT, Vol. 2, no. 2, 2012.

5. M. N. Quadril, N. V. Kalyankar. Drop out feature of student data for academic performance using decision tree techniques, GJCST, Vol. 10, no. 2,

5. M. Ramaswami, R & Bhaskaran(2009). A Study on Feature Selection Techniques in Educational Data Mining, journal of computing, Vol. 1, no. 1, Dec. 2009.

6. D. Vaghasiya, Priyanka & Sahista Machchhar(2015). Attribute Selection Methods with Classification Techniques in Educational Data Mining to Predict Student’s Performance: A Survey.

Data Mining and Knowledge Engineering, Vol. 7.1 pp. 9-13, 2015.

7. Ahamed, B. B., & Ramkumar, T. (2016). An intelligent web search framework for performing efficient retrieval of data. Computers & Electrical Engineering, 56, 289-299.

8. Ahamed, B. B., &Hariharan, S. (2012). Integration of Sound Signature Authentication System. International Journal of Security and Its Applications, 6(4), 77-86.

9. Ahamed, B. B., & Ramkumar, T. (2018). Proficient Information Method for Inconsistency Detection in Multiple Data Sources.

10. Ahamed, B. B., & Ramkumar, T. (2015). Deduce User Search Progression with Feedback Session. Advances in Systems Science and Applications, 15(4), 366-383.

11. Gomathi, M., & Ahamed, B. B. Socio-Technical Accordance Perspective For Software Implementation Correlation With Fault Aptitude.

(12)

12. Ahamed, B. B., & Hariharan, S. (2012). Implementation of Network Level Security Process through Stepping Stones by Watermarking Methodology. International Journal of Future Generation Communication and Networking, 5(4), 123-130.

13. Ahamed, B. B., & Hariharan, S. (2012, December). State of the art process in query processing ranking system. In 2012 Fourth International Conference on Advanced Computing (ICoAC) (pp. 1- 5). IEEE.

14. Ahamed, B. B., Ramkumar, T., & Hariharan, S. (2014, December). Data integration progression in large data source using mapping affinity. In 2014 7th International Conference on Advanced Software Engineering and Its Applications (pp. 16-21). IEEE