132
Copyright © 2016. Vandana Publications. All Rights Reserved.
Volume-6, Issue-6, November-December 2016
International Journal of Engineering and Management Research
Page Number: 132-136
Prediction of Performance of Students Using Bayesian Classification
Shirdi Wazeed Baba1, Reddi Sanjeev Kumar2
1,2Indian Institute of Information Technology Design and Manufacturing, Jabalpur, INDIA
ABSTRACT
Now a day’s students have a large set of data having precious information hidden. Data mining technique can help to find this hidden information. In this paper, data mining techniques name Bayes classification method is used on these data to help an institution. Institutions can find those students who are consistently perform well. This study will help to institution reduce the drop put ratio to a significant level and improve the performance level of the institution.
Keywords-- Data mining, classification, Predictive
model, Bayesian classification
I.
INTRODUCTION
As the cost of processing power and storage is coming down, data storage became easier and cheaper so the amount of data stored in educational databases is increasing rapidly. In order to get benefits from such large database to find hidden relationships between variables using different data mining techniques developed and used [5].
The term ―data mining‖ is primarily used by statisticians, database researchers, and the business communities. The term KDD (Knowledge Discovery in Databases) refers to the overall process of discovering useful knowledge from data, where data mining is a particular step in this process. The steps in the KDD process, such as data preparation, data selection, data cleaning, and proper interpretation of the results of the data mining process, ensure that useful knowledge is derived from the data. Data mining is an extension of traditional data analysis and statistical approaches as it incorporates analytical techniques drawn from various disciplines like AI, machine learning, OLAP, data visualization, etc.
This study aims to analyze previous year data and predict New Year student as a performer and underperformer. These applications can help both instructor and student to enhance the quality education. The classification carried out using a Bayesian classification method [8].
II.
DATA MINING
Data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms, and machine learning methods such as neural networks or decision trees. Consequently, data mining consists of more than collecting and managing data, it also includes analysis and prediction.
The objective of data mining is to identify valid, novel, potentially useful, and understandable correlations and patterns in existing data. Finding useful patterns in data is known by different names (e.g.,
knowledge extraction, information discovery,
information harvesting, data archeology, and data pattern processing).
III.
BACKGROUND AND RELATED
WORK
Data mining as a discipline is largely transparent to the world. Most of the time, we never even notice that it’s happening. But whenever we sign up for a grocery store shopping card, place a purchase using a credit card, or surf the Web, we are creating data. These data are stored in large sets on powerful computers owned by the companies we deal with every day. Lying within those data sets are patterns— indicators of our interests, our habits, and our behaviors[2]. Data mining allows people to locate and interpret those patterns, helping them make better informed decisions and better serve their customers[7]. That being said, there are also concerns about the practice of data mining. Privacy watchdog groups in particular are vocal about organizations that amass vast quantities of data, some of which can be very personal in nature. Luan uses data mining to group students to determine which student can easily pile up their courses and which take courses for longer period of time [9].
IV.
CLASSIFICATION
133
Copyright © 2016. Vandana Publications. All Rights Reserved.
Classification, Regression, Time series analysis, Prediction etc. Many of the data mining applications are aimed to predict the future state of the data [3].
Prediction is the process of analyzing the current and past states of the attribute and prediction of its future state. Classification is a technique of mapping the target data to the predefined groups or classes, this is a supervise learning because the classes are predefined before the examination of the target data. The regression involves the learning of function that map data item to real valued prediction variable. In the time series analysis the value of an attribute is examined as it varies over time. In time series analysis the distance measures are used to determine the similarity between different time series, the structure of the line is examined to determine its behavior and the historical time series plot is used to predict future values of the variable [4].
Predictive modeling can be thought of as learning a mapping from an input set of vector measurements x to a scalar output y [4]. Classification maps data into predefined groups are classes. It is often referred to as supervised learning because the classes are
determined before examining the data. They often describe these classes by looking at the characteristic of data already known to belong to the classes [10].
V.
BAYESIAN CLASSIFICATION
The Naive Bayes algorithm is a classification algorithm based on Bayes rule and a set of conditional independence assumptions. Given the goal of learning P(Y|X) where X = hX1 ...,Xni, the Naive Bayes algorithm makes the assumption that each Xi is conditionally independent of each of the other Xks given Y, and also independent of each subset of the other Xk’s given Y. The value of this assumption is that it dramatically simplifies the representation of P(X|Y), and the problem of estimating it from the training data. Consider, for example, the case where X = hX1,X2i. In this case
Table 1 DIVISION
I II III FAIL
Caste category GEN 100 120 34 46
OBC 61 75 19 12
SC/ST 29 50 28 23
LANGUAGE MEDIUM
ENGLISH 100 70 18 6
HINDI 90 178 63 65
CLASS CSE 12 76 54 22
ECE 43 72 4 10
MECH 80 50 15 18
MTECH_CSE 54 20 6 8
MTECH_ECE 1 30 12 13
MTECH_MECH 23 43 12 10
P(X|Y) = P(X1,X2|Y)
= P(X1|X2,Y)P(X2|Y) = P(X1|Y)P(X2|Y)
Where the second line follows from a general property of probabilities, and the third line follows directly from our above definition of conditional independence. More generally, when X contains n attributes which satisfy the conditional independence assumption, we have P(X1 ...Xn|Y) = n ∏ i=1 P(Xi |Y)
VI.
EDUCATIONAL DATA MINING
IN HIGHER EDUCATION
Providing higher education to all sector’s of a
nation’s population means confronting social
134
Copyright © 2016. Vandana Publications. All Rights Reserved.
Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in [15]. Mining in educational environment is called Educational Data Mining, concern with developing new
methods to discover knowledge from educational database[6].Lack of deep and enough knowledge in higher educational system may prevent system management to achieve quality objectives, data mining methodology can help this knowledge gaps in higher education system [11].
VII. APPLICATION
Data gathered from different Engineering colleges in India was analysed using classification method to predict the student’s performance. The following steps are performed in sequence:
1) Data set: The data set used in this study was obtained
from different colleges on the sampling method of computer science department of session 2009-10. Initially size of the data is 600.
2) Database Software: Microsoft SQL Server 2005 was
used to store the data. The reason behind choosing MSSQL Server is: it was compatible and efficient to use with the database management system i.e. relational database.
3) Application software: Matlab environment is used as
programming environment. The Matlab software suitable for the development of application with MSSQL server 2005.
4) Data mining Process: The data exploration and
presentation process consisted of following steps:
i) Data Understanding: It starts with an initial data collection, to get familiar with the data, to identify data quality problems, to discover first insights into the data or to detect interesting subsets to form hypotheses for hidden information
ii) Data Preparation: covers all activities to construct the final dataset from raw data
iii) Data transformation: In this phase, various modeling techniques are selected and applied and their parameters are calibrated to optimal values.
Data is transformed into the format of table 1. The head division shows the obtained division of the student in final exam. Caste category shows the demographic distribution of the student defined by the GOI (Government of India). Language medium is the medium in which student passed his/her graduation program. Class is the stream of the student. This is categorized as CSE, ECE, MECH, MTECH_CSE, MTECH_ECE, MTECH_MECH.
Implementation of mining model: Given a training set
the naïve Bayes algorithm first estimates the prior probability P (Cj) for each class by counting how often each class occurs in the training data. For each attribute value xi can be counted to determine P (xi). Similarly the
probability P (xi | Cj) can be estimated by counting how often each value occurs in the class in the training data. When classifying a target tuple, the conditional and prior probabilities generated from the training set are used to make the prediction. Then estimate P (ti | Cj) by
P(ti |Cj)= ∏ (P(xik |Cj) k=1 )
To calculate P (ti) we can estimate the
likelihood that ti is in each class. The probability that ti is
in a class is the product of the conditional probabilities for each attribute value. The class with the highest probability is the one chosen for the tuple [10].
Results and discussion: After Bayesian classification the
table 1 of data produces the table 2 and figure 1 shows the pictorial representation of table2. Table 2 shows the highest probability of division of a particular class . For example if a student is of medium, and with computer science then it can be predicted that s/he will score second division mark .
Table 2
Medium Category Division Class Probability
English Gen I CSE 0.54
II ECE 0.52
III MECH 0.71
II mtech_cse 0.80
I mtech_ece 0.68
II mtech_mech 0.59
135
Copyright © 2016. Vandana Publications. All Rights Reserved.
II ECE 0.71
II MECH 0.81
I mtech_cse 0.75
III mtech_ece 0.47
II mtech_mech 0.50
SC/ST III CSE 0.60
I ECE 0.71
III MECH 0.59
II mtech_cse 0.46
I mtech_ece 0.58
II mtech_mech 0.47
Hindi Gen II CSE 0.50
III ECE 0.48
I MECH 0.56
III mtech_cse 0.65
II mtech_ece 0.42
I mtech_mech 0.55
OBC II CSE 0.63
III ECE 0.38
II MECH 0.60
I mtech_cse 0.43
I mtech_ece 0.47
II mtech_mech 0.45
SC/ST I CSE 0.57
I ECE 0.61
III MECH 0.53
II mtech_cse 0.51
III mtech_ece 0.72
136
Copyright © 2016. Vandana Publications. All Rights Reserved.
VIII. CONCLUSION
In this paper, Bayesian classification method is used on student database to predict the students division on the [3]. basis of previous year database. This study will help the students and the teachers to improve the division of the student. This study will also work to identify those students which needed special attention to reduce failing [4]. ration and taking appropriate action at right time.
REFERENCES
[1] kp nar ― eri a anlar nda ilgi e fi ve eri aden ili i‖ .Ü letme ak ltesi Dergisi, Say :1 (1-22), Nisan 2000
[2] Alaa el-Halees (2009) Mining Students Data to Analyze e-learning Behavior: A Case Study.
[3] Connolly T., C. Begg and A. Strachan (1999) Database systems: A Practical Approach to Design, Implementation, and Management (3rd Ed.). Harlow: Addison-Wesley.687
[4] David Hand, Heikki, Mannil Padraic smyth,
―Prin iples of Data ining‖ PHI
[5] Elena susena. ―using data mining techniques in higher edu ation‖ National defence university ―carol I‖
Bucharest 68-72.
[6] rdo an ― eri aden ili i ve eri aden ili inde ullan lan - eans lgoritmas n n ren i eri a an nda ygulanmas ‖ ksek isans e i stan ul Üniversitesi, 2004.
[7] Gabrilson, S., Fabro, D. D. M., V alduriez, P ., 2008. Towards the efficient development of model transformations using model weaving and matching transformations, Software and Systems Modeling 2003. Data Mining with CRCT Scores. Office of information
technology, Geogia Department of Education.
[8] Han, J. W., Kamber, M., 2006. Data Mining: Concepts and Techniques, 2nd Edition, The Morgan Kaufmann Series in Data Management Systems, Gray, J.
Series Editor, Morgan Kaufmann Publishers.
[9] Luan, J., 2002. Data mining and knowledge management in higher education – potential applications. In Proceedings of AIR Forum, Toronto, Canada [10] Margret H. Dunham, ―Data Mining: Introductory and advance topi ‖
[11] Shaeela Ayesha, Tasleem Mustafa, Ahsaan Raza star, M Inayat Khan-―Data mining Model for higher edu ation‖ ISSN 1450-216X Vol.43 No.1 (2010) pp24-29
[12] Philip G Altbach, Liz Reisberg and laura E. Rumbley, a report ptrpared for the UNESCO 2009 world conference on higher education.
[13] Thearling, K., ― n Introduction to Data ining‖ ,http://thearling.com/text/dmwhite/dm
[14] Westphal, C., Blaxton, T., 1998. Data Mining Solutions, John Wiley.
[15]www.educationaldatamining.org