An Approach Towards E-Learning Using SVM Classification Technique and Ranking Technique in Microblog Supported Classroom: A Survey

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 7, July 2013)

315

An Approach Towards E-Learning Using SVM Classification

Technique and Ranking Technique in Microblog Supported

Classroom: A Survey

Nitin Pawar

1

, Prof. S.K. Sonkar

2

1_{ME (Computer) Student,}2_{Associate Professor, Department of Computer Engineering, AVCOE, Sangamner, University of Pune}

Abstract - This paper gives comparative study for different classification techniques and ranking techniques for

Microblog Supported Classroom system and brief

introduction of Microblog Supported Classroom system. Micro-blogging is a technology used in social networking applications which can be used in classroom for the communication between student and teacher. Nowadays from research it is observed that it has numerous benefits. Its major advantage is that the number of questions asked is much more than that of questions asked in certain time period of lecture. It is beneficial for student to get there doubts cleared not only in classroom but also outside. But it can happen that questions not related to particular topic can furthermore be raised.

To overcome such a problem, questions should be classified as relevant and irrelevant questions using classification technique and classified question should get tagged on the basis of concept which is called as Concept Tagging. After classification question should be ranked using ranking techniques.

Keywords— Concept Tagging, Decision Tree, Machine Learning, Microblogging, Support Vector Machine (SVM), Ranking.

I. INTRODUCTION

Microblogging is a World Wide Web 2.0 technology, is a type of blogging that permits the users mail short text notes (usually less than 200 individual features) to their community in genuine time via some communication passages such as the world wide web, e-mail, and instant messengers [1]. Counting on who a user follows (i.e., communicates with) and is followed by, a microblogging tool such as Twitter [2] can be effectively utilized for expert networking. Recently, microblogging tools have been utilized in classroom environments as a connection tool between a student and the instructor [3] as well as between students themselves [4–6]. Utilizing micro-blogging in class rooms has several benefits and disbenefits (that can be found in [3]). A significant topic with large microblogging supported classroom is that, the count of questions/ comments a teacher receives from the student can be many more than what s/he can response in a limited time.

Therefore there is need to differentiate the relevant questions from irrelevant and that should be addressed by the teacher, also teacher should receive alert message of relevant question that is categorized into irrelevant. The preceding works have suggested a tool to categorize the questions into relevant or irrelevant which are posted by students to assist an instructor. This method is developed for one particular course only. If an instructor handles more than one course, the proposed method cannot be employed. This work suggests a scheme for handling the multiple courses offered by a teacher in an effective way and also main concept of each classified question should get tagged to find answer easily, which is called as ―Concept Tagging‖. After classification question should be ranked using ranking techniques.

II. LITERATURE REVIEW

This part reviews related work on identification of relevant questions out of all questions (relevant and irrelevant) in a micro-blogging supported classroom.

Micro-blogging has recently been used as a communication tool between a student and the instructor [3] as well as with other students [4–6]. Although all of these works are related to use of micro-blogging for educational reasons, their main aim is on how microblogging can be used academically [3], and on how to analyze microblogging in the context of learning [4–6]. Yet, an important problem in micro-blogging supported classrooms and other classrooms is that the total number of questions a teacher receives from the students can be much more in a restricted time.

(2)

International Journal of Emerging Technology and Advanced Engineering

316

Posted question text and personalization (i.e., favoring question coming from a student who has been asking good questions) is used to differentiate relevant questions from irrelevant questions. Whereas question text and personalization are quite important for identifying relevant and irrelevant questions,

i. Using the correlation between questions and available lecture materials and

ii.Using the correlation among questions asked in a lecture.

In a recent work, Cetintas et al. prove that using the association between questions asked in a lecture and the available materials of lecture, along with question and personalization, helps to better identify relevant and irrelevant questions [7]. Yet, they do not consider using the association between questions themselves asked in a lecture. Furthermore, they do not suppose the effect of removal of stop words on the classifier performance

1.When they are not removed for the bag of words representation of the i/p space of the classifiers; and 2.When the association among questions themselves

and the association among questions and available lecture materials are calculated.

However, the task of finding similar questions to a question of specific interest is distinct than the task of recognizing the best questions to respond to in a lecture; and as Song et al. note[15], none of those former works address the usefulness of questions. Recently, studies by Sun et al. and Song et al. utilize users’ interest in the questions to find the most useful questions [15] and to recommend questions when users browse questions by category [16]. Specifically, Sun et al. utilize user’s votes on the questions for prior questions. However, the latter study research work does not consider personalization, or the association amidst questions. Moreover, since the environment is not a classroom environment, it is not able to see the effect of using accessible address materials on recognizing the best questions to respond to, as no lecture components are available.

III. METHOD REVIEW -CLASSIFICATION

Classification is a data mining technique used to predict group membership for data instances. Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts, for the reason of being able to use the model to predict the class of objects whose class label is unknown. The derived system is based on the analysis of a set of training data (i.e., data objects whose class label is known).

Data classification is a two-step process from which first step is to built describing a predetermined set of data classes. This is the learning step or training step, where a classification algorithm builds the classifier by analysing or learning from a training set made up of database tuples and X=(x1, x2,…,xn), depicting n measurements made on tuples having n attributes (A1,A2,….An) respectively. Each tuple, X, is assumed to belong to a predefined class as determined by another database attribute called the class label attributes. The class label attributes are discrete-valued and unordered. It is categorical in that each value serves as a category or class. Because the class label of each training tuple is provided, this step is furthermore renowned as supervised learning. In the second step, the model is used for classification. A test set is used, which is a collection of test tuples and their associated class labels. These tuples are randomly chosen from the general data set. They are independent of the training tuples. The accuracy of a classifier on a given test set is the percentage of test set tuples that are correctly classified by the classifier. Associated class label of each test tuple is compared with the learned classifiers class prediction for that tuple. Classification technique is capable of processing a wider variety of data than regression and is growing in popularity. Classification techniques like Decision tree classifiers, classification by back propagation, Bayesian classifiers, support vector machines (SVM), and classification based on association are all examples of eager learners‖ in that they use training tuples to construct a generalization model and in this way are ready for classifying new tuples. This contrasts with ―lazy learners‖ or instance based methods of classification, like nearest neighbour classifiers and case based reasoning classifiers, which gather all of the training tuples in pattern space and wait until presented with a test tuple before performing generalization. Hence, lazy learners need efficient indexing techniques.

A. Decision Trees

(3)

International Journal of Emerging Technology and Advanced Engineering

317

If two trees employ the similar kind of tests and have the same prediction accuracy, the one with lesser leaves is usually preferred. Decision trees are usually unvaried since they use based on a single feature at each internal node. One of the most useful characteristics of decision trees is their comprehensibility. People can easily realise why a decision tree classifies an instance as belonging to a specific class. Though a decision tree constitutes a hierarchy of tests, an unidentified feature value during classification is usually dealt with by passing the example down all branches of the node where the unknown feature value was detected, and every branch outputs a class distribution. The output is a combination of the several class distributions that sum to 1. The assumption took in the decision trees is that instances belonging to different classes have different values in at least one of their features. Decision trees inclined to perform better when dealing with discrete/categorical features.

B. Rule Based Classification

The difference between heuristics for rule learning and heuristics for decision trees is that the latter evaluates the average quality of a number of disjointed sets one for each value of the feature that is tested, while rule learners only examine the quality of the set of instances that is covered by the candidate rule. Rules can be generated either from decision trees or directly from training data using sequential covering algorithms.

1. Using IF-THEN Rule: A rule based classifier uses a set of IF-THEN rules for classification. An IF-THEN rule expression having the form of: IF condition THEN conclusion. Consider an example of rule R1 as follows: R1: IF age=young AND student=yes THEN buys computer = yes

IF part (left-hand side) of rule is called ―Rule Antecedent or Prediction. THEN part (right-hand side) of rule is known as ―Rule Consequent‖.

2. Rule Extraction from Decision Tree: Decision trees are large and difficult to interpret. A rule based classifier can be constructed from decision tree by extracting IF-THEN rules from decision tree. In comparison with decision tress, IF-THEN rules may be easier for human to understand, particularly when decision tree is large. To extract rules from decision tree, one rule is created for each path from root to a leaf node. Each splitting condition along a given path is logically ANDed to form the rule antecedent (IF part). The leaf node holds a class predication which forms the rule consequent (THEN part). A disjunction (logical OR) is implied between each of extracted rule.

As rules are extracted directly from the tree, they are mutually exclusive or mutually exhaustive. Mutually exclusive means that we cannot have a rule conflicts here because no two rules will be triggered for the same tuple. Mutually exhaustive means there is only one rule for each possible attribute-value combination.

3. Rule Induction using a Sequential Covering Algorithm:

IF-THEN rules can be extracted directly from the training data using Sequential Covering Algorithm. As a name suggest, the rules are learned sequentially (one at a time), where each rule for a given lass ideally covers many of the tuples of that class. Some popular sequential covering algorithms include AQ, CN2 and the most recent RIPPER. In rule induction, a rule quality measure can be used as a criterion in the rule specification and/or generalization process. In classification, a rule feature value can be associated with each rule to resolve conflicts when multiple rules are satisfied by the example to be classified. Most useful characteristic of rule based classifiers is their comprehensibility.

C. Bayesian classification

BAYESIAN Classifiers are statistical classifiers. They can forecast class membership probabilities, such as probability that a given tuple pertains to specific class. BAYESIAN Classification is founded on Bayes theorem.

1. Bayesian Network: Bayesian Network: Bayesian Network (BN) is a graphical model for probability relationships among a set of variables (features). The Bayesian network structure S is a directed acyclic graph (DAG) and the nodes in S are in one-to-one correspondence with the features X. The arcs comprise casual leverages among the features while the lack of possible arcs in S encodes conditional independencies. Bayesian belief networks specify joint conditional probability distributions. They permit class conditional independencies to be defined between subsets of variables. They supply a graphical model of causal relationships, on which learning can be presented. Trained Bayesian belief networks can be used for classification. Bayesian belief networks are furthermore known as conviction networks, Bayesian networks, and probabilistic networks..

(4)

International Journal of Emerging Technology and Advanced Engineering

318

D. Support Vector Machine (SVM)

Support Vector Machine (SVM) is a new method for classification of both linear and Non-linear data. SVM was first introduced in 1992 by Vapnik. It is a method of supervised Machine Learning. SVM is a margin based classifier. SVM are typically used classification, regression and ranking. Two special properties of SVM are- (1) it achieves high generalization by maximizing the margin and (2) support an efficient learning of non-linear functions by kernel trick. SVMs were initially developed for classification and have been extended for regression and preference (or rank) learning. The initial form of SVMs is a binary classifier where the output of learned function is either positive or negative. A multi class classification can be implemented by combining multiple binary classifiers using pair wise coupling method. Support Vector Machines are based on the concept of decision planes that define decision boundaries. A decision plane is a plane that separates between a set of objects having different class memberships. Support Vectors are the data points that lie closest to decision surface. They are difficult to classify. The idea of SVM is to find the boundary so that the distance to the nearest data point and this boundary is maximized. This makes the maximum sized margin. In case of a two dimensions problem, this boundary is a line. The SVM algorithm does works in multiple dimensions means SVM algorithm find the ―optimal separating hyperplane‖ given the training data. If the data can be separated by a hyperplane the problem is linearly separable. The points on the margin of the boundary are called the Support Vectors. The first paper on SVM was presented in 1992 by Valdimir Vapnik and his colleagues Bernhard Boser and Isabelle Guyon. But the groundwork for SVM has been around since the 1960 including early work by Vapnik and Alexei Chervonekis on statistical learning theory.

[image:4.612.377.534.148.274.2]

1. SVM based classification for linear data: Consider there are L training points, where each input xi has D attributes that is the dimensionality D and is in one of two classes yi =+1 or yi =-1 so, the training data is of the form: xi; yi. Here, assume that data is linearly separable that is it is possible to draw a straight line w.x + b = 0 such that all training points (data) with yi =-1 falls on one side of the line (i.e. all negative examples) and all training points (data) with yi =+1 falls on other side of the line (i.e. all positive examples). In 2D, separation is possible with line but for higher dimensions we need a Hyperplane.

Figure 1: Number of possible hyperplanes for separation of linear data [15]

[image:4.612.332.556.386.497.2]

Figure 1 shows the example of linear data. Objects belong either to class Green or RED. Infinite number of separating hyperplanes can be possible to classify the object as Green or RED. But all are not the optimal hyperplane. SVM finds the optimal hyperplane among all the possible hyperplanes.

Figure 2: Support Vectors with large margin [15]

(5)

International Journal of Emerging Technology and Advanced Engineering

319

[image:5.612.70.280.239.372.2]

Moving the support vectors move decision boundary. Moving other vectors has no effect. Thus, support vectors are the critical elements of the training set and are difficult to classify. Figure 3 shows the concept of hyperplane through linearly separable classes. Data points related to class 1 are represented as ―empty circles‖ and class 2 are represented as ―filled circles‖. Referring to figure 3, the mathematical formulation for SVM can be described.

Figure 3: Hyperplane through linearly separable classes [5]

The hyperplane is described by equation

w.x + b = 0 (1)

Here,

w is a normal to the hyperplane. w is a weight vector and described as w = w1,w2,……,wn

b is a scalar.

||

w

b

is a perpendicular distance from hyperplane to

origin

||w|| is Euclidean distance of w i.e.

w

.

w

Values of w and b are selected by SVM from training data. Fig 3 shows that SVM selects hyperplane H such that training data can be described as follows:

xi.w + b >= 1 for yi = +1 (2) xi.w + b <=-1 for yi = -1 (3)

Combining eq. 2 and eq. 3 we get,

Yi (xi.w + b) - 1 >= 0



i

(4) Equation 4 is the ―constraint‖ that all training data fall on either side of support hyperplane. The points that lie closest to the separating hyperplane, i.e. the Support Vectors (shown in circles in the figure). Two planes H1 and H2 that these points lie on can be described by:

H1: xi.w + b = +1 (5)

H2: xi.w + b = -1 (6)

Now, for any data point x that lie either on H1 or H2 distance from hyperplane is given as

d1= distance from H1 to hyperplane. d2= distance from H2 to hyperplane.

The hyperplane’s equidistant from H1 and H2 means that d1 = a quantity known as the SVM’s margin. In order to orientate the hyperplane to be as far from the Support Vectors as possible, we need to maximize this margin. The margin of a separating hyperplane is d1+ d2.

All points for which eq.2 holds lie on H1 with normal w

and perpendicular distance

||

1 w

b



All points for which eq. 3 holds lie on H2 with normal w

and perpendicular distance

||

1 w

b



So, d1=

|

|||

1 w

and d2=

||

1 w

. So, SVM

margin=

||

2 w

.

In order to maximize margin, minimize ||w|| with the condition that there are no data points between H1 and H2. As H1 and H2 are parallel, they have the same normal and no training points fall between them. Thus, find the pair of hyperplanes which gives the maximum margin by minimizing ||w2|| subject to constraint eq. 4. So, the primal problem of SVM can be formulated as follows:

Minimize

2

1

||w2||

Subject to yi (xi.w + b) – 1 >= 0



i

(7)

(6)

International Journal of Emerging Technology and Advanced Engineering

320

This is a crucial property which will allow us to generalize the procedure to the non-linear case. Consider positive Lagrange multipliers



i

i = 1…L, one for each of the inequality constraints eq. 7. The rule is that for constraints of the form ci >=0 the constraint equations are multiplied by positive Lagrange multipliers and subtracted from the objective function, to form the Lagrangian. For equality constraints, the Lagrange multipliers are unconstrained.

This gives Lagrangian as follows:

Lp =

2

1

||w2|| -



[yi (xi.w + b) - 1



i

] (8)

So,

Lp =

2

1

||w2|| -







L i

b

w

xi

iyi

1

)

.

(



+



 L i

i

1



(9)

Now, find w and b which minimizes and



which maximizes eq.9. This can be achieved by differentiating Lp with respect to w and b and setting the derivatives to zero.

w

Lp



= 0







L

i

iyi

1



= 0 (10)

By substituting values of equation 10 in 9, reformulate optimization problem which is dependent on



as:

LD =



 L i

i

1



-

2

1 

i

jyiyjxixj

i



s.t.



i

>= 0



i

,



 L i

iyi

1



=0 (11)

This new formulation LD is referred to as the Dual form of the primary Lp. In this new formulation the constraints will be replaced by constraints on the Lagrangian multipliers and the training data will occur only as dot products. xi and xj are input vectors appear in the form of dot product.

IV. METHODOLOGY

Proposed system should go through following methodology as per method selected for classification section [III] and method selected for ranking from [17] with some modification:

A. Use SVM for Multiple Concept Classification B. Concept Tagging

C. Ranking Algorithm D. Alert Message

A. Use SVM for Multiple Concept Classification

The suggested system is SVM Based Question Classification System. This system classify the questions posted by students of microblog supported classroom using SVM. The scheme comprises of 3 modules:

1. Pre-processing of Posted Questions: 2. Classification using SVM Light: 3. Post-processing of Output of SVMLight:

1. Pre-processing of Posted Questions:

1.1. Pre-process the posted questions and represent each article as feature vector.

1.2. Tokenization: separate the text in to individual words. 1.3. Stop word removal: to remove common words those are usually not useful for text classification. For example: to remove words such as ―a‖, ―the‖, ―I‖, ―he‖, ―she‖, ―is‖, ―are‖ etc.

1.4. Stemming: to normalize words derived from the same root. Examples: playing - play. Running - run. Batting-bat 1.5. Feature extraction: after above steps, each word represents a feature. From the list of these features, extract some features (limited) for the purpose of classification. 1.6. Unigram features: to use each word as a feature a. To use TF (term frequency) as feature value.

b. To use TF * IDF (inverse document frequency) as feature value

c. IDF = log (total-number-of-documents / number-of documents- containing-t)

1.7. Bigram features: to use two consecutive words as a feature.

1.8 .In this way each article is represented as feature vectors. Each distinct word corresponds to a feature and number of times it occur in the article as its value.

1.9. Finally, scaling the dimensions of feature vector with their inverse document frequency (IDF).

1.10. Posted Questions ready in the format accepted by SVM software SVMLight

2. Classification using SVM Light:

This step is consists of 2 main tasks training the SVM classifier and testing its accuracy.

2.1. Training the SVM classifier (Learning phase): Some of the questions (75-80%) from the given set are used as training data to train the classifier. Training file

(7)

International Journal of Emerging Technology and Advanced Engineering

321

2.2. Classification (testing phase): Remaining articles from the given set (about 20-25%) are used as testing data to test the accuracy of the classifier built in above step. Testing file e.g. test.dat is prepared with the testing data in the format accepted by SVM Light. SVM performs the classification of this data and generates support vectors i.e. questions on the boundary of the hyper plane which can be considered as relevant and other also.

3. Post-processing of Output of SVMLight: This is the final step of implementation. The generated output of SVM Light as mentioned in previous step is the support vectors for the given classification. Then this is further processed to generate the number of posted questions those are relevant to subject only. Based on precision and recall result, performance of the classifier can be measured in terms of loss function. Confusion matrix can be generated.

B. Concept Tagging

In concept tagging, important concepts should get identified using stop word removal function and that concept should get tagged. By doing this we get the output in the form of word which is used to search the related material given by teacher in Microblog supported classroom. By using above method we are improving the speed of retrieving related data. It also gives the accurate data.

C. Ranking Algorithm

Ranking algorithm for question poste by student has following steps:

1. Cosine similarity:

i) First, from all classified questions on micro blog, lexically similar questions are selected by examining cosign similarity between questions.

ii) On the basis of it, an undirected graph of questions will construct. Each node of undirected graph represents a question and two nodes are edged such that similarity between questions exceeds a certain threshold value. iii) Using cosine similarity we recognize the similar question. And remove similar question to reduce the database load for increasing the speed.

2. Counting hit:

In this we count the hit of question. If hit on particular question is greater, then question rank is better. Using above three methods sequential we are able to decide the question ranking of particular question.

D. Alert Message

In order to make efficient data by teacher to student alert message will alert to the particular teacher for improving the lecture material as new relevant question arises. This alert message is generated on the basis of student’s category. Student’s category gets identified using threshold value of correct questions of each student. If question is put by student whose number of correct questions are greater than threshold value then it sends an email or alert message to particular teacher.

V. CONCLUSION

Support Vector Machine (SVM) is best method for classification as compared to other techniques. As per strategy of automatic sorting of the course specific questions and identifying the relevant questions posted on an instructors Microblog by students attending two or more course has been presented. It has been shown to be beneficial to help the correlation among questions and available lecture materials as well as the correlations between questions asked in a particular lecture. Furthermore, it is found to be significantly more effective to remove stop words and stemming the word when calculating the correlations among questions themselves.

VI. FUTURE WORK

We will send alert information to particular subject staff in the form of mail to particular subject staff for quick response. Also classification of more than one subject more is possible in Microblog supported classroom. As concept tagging method give 50% accuracy, we are trying to increase the efficiency of so student get the correct answer of related question. To achieve the accuracy of we can use ranking algorithm for concept tagging which is already used for ranking purpose of question in Microblog supported classroom.

REFERENCES

[1] S. Cetintas, L. Si, H. Aagard, K. Bowen, and M. Cordova-Sanchez, ―Microblogging in a classroom: Classifying students’ relevant and irrelevant questions in a microblogging-supported classroom,‖ TLT, vol. 4, no. 4, pp. 292–300, 2011.

[2] Java, X. Song, T. Finin, and B. Tseng, ―Why we twitter: understanding microblogging usage and communities,‖ in Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD ’07, (New York, Y, USA), pp. 56–65, ACM, 2007.

(8)

International Journal of Emerging Technology and Advanced Engineering

322

[4] K. Borau, C. Ullrich, J. Feng, and R. Shen, Microblogging for

language learning: Using twitter to train communicative and cultural competence,‖ in Proceedings of the 8th International Conference on Advances in Web Based Learning, ICWL ’009, (Berlin, Heidelberg), pp. 78–87, Springer-Verlag, 2009.

[5] Costa, G. Beham, W. Reinhardt, and M. Sillaots, ―Microblogging in technology enhanced learning: A use-case inspection of ppe summer school 2008,‖ in Proceedings of the 2nd SIRTEL’08 Workshop on Social Information Retrieval for Technology Enhanced Learning (R. Vuorikari, B. Kieslinger, R. Klamma, and E. Duval, eds.), no. 382, (Maastricht, The Netherlands), September 2008.

[6] Ullrich, K. Borau, H. Luo, X. Tan, L. Shen, and R. Shen, ―Why web 2.0 is good for learning and for research: principles and prototypes,‖ in Proceedings of the 17th international conference on World Wide Web, WWW ’08, (New York, NY, USA), pp. 705– 714, ACM, 2008.

[7] S. Cetintas, L. Si, S. Chakravarty, H. Aagard, and K. Bowen, ―Learning to identify studentsŠ relevant and irrelevantquestions in a micro-blogging supported classroom,‖ Working Papers 1009, Purdue University, Department of Consumer Sciences, 2011.

[8] L.-K. Soh, N. Khandaker, and H. Jiang, ―I-minds: A multiagent system for intelligent computer- supported collaborative learning and classroom management,‖ Int. J. Artif. Intell. Ed., vol. 18, pp. 119– 151, Apr. 2008.

[9] Y. Answers., ―Last accessed:,‖ Oct. 2012. [10] WikiAnswers ―Last accessed:,‖ Oct. 2012. [11] B. Zhidao., ―Last accessed:,‖ Oct. 2012.

[12] H. Duan, Y. Cao, C. yew Lin, and Y. Yu, ―Searching questions by identifying question topic and question focus,‖ in In Proceedings of 46th Annual Meeting of the Association for Computational Linguistics: Human Language Tchnologies (ACL:HLT, 2008. [13] P. Han, R. Shen, F. Yang, and Q. Yang, ―The application of case

based reasoning on q&a system,‖ in Proceedings of the 15th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence, AI ’02, (London, UK, UK), pp. 704–713, Springer-Verlag, 2002.

[14] V. Jikoun and M. de Rijke, ―Retrieving answers from frequently asked questions pages on the web,‖ in Proceedings of the 14th ACM CIKM Conference, pp. 76–83, 2005.

[15] http://www.dtreg.com/svm.html

[16] Tristan Fletcher, March 1, 2009, ―Support Vector Machine Explained‖