Results and Discussion - Evaluating Learning Resource Representation

3.4 Evaluating Learning Resource Representation

3.4.4 Results and Discussion

This section presents the results of evaluating different document representation methods. The performance of the following methods are compared:

• CONCEPTBASED document representation method, (CONCEPTBASED) which represents documents using the domain concepts (Section 3.3.1).

• CONCEPTBASEDAugmented document representation method, (CB-AUG) which uses the term distribution in the concept vocabulary to influence the weight of terms in the document vocabulary (Section 3.3.2).

3.4. Evaluating Learning Resource Representation 52

• BOW method is a standard Information Retrieval method where documents are represented using the terms from the document space only with TF-IDF weighting. BOW is used as the benchmark method.

• RANDOMmethod has been included to give an idea of the relationship between the threshold and the precision values.

For all the methods, the documents are first pre-processed by removing English stopwords and ap- plying Porter stemming. Then, after representation, a similarity-based retrieval is employed using cosine similarity. The methods are evaluated using a leave-one-out retrieval. The performance of CONCEPTBASEDand CB-AUG are compared against that of BOW.

Figure 3.8 shows the precision of the methods given an overlap threshold of 0.14. The number of recommendations (n) is shown on the x-axis, while the average precision@n is shown on the y-axis. The number of recommendations range from 1 to 10, because our interest is in the top 10 recommendations retrieved. Typically one would focus on the earlier retrievals because these should contain documents that are more likely to be relevant.

The results from the RANDOM (N) method are consistent with the relationship between the threshold and the proportion of data as shown in Figure 3.7. When an overlap threshold of 0.14 is used, the RANDOMmethod has an average precision of about 0.1, which is 10%. Recall that there were 10% of document pairs with overlap scores ≥ 0.14. So these results from RANDOM are consistent based on the data used.

Overall, the CONCEPTBASED augmented method () performs better than the BOW(×),

CONCEPTBASED(l) and RANDOMmethods. The BOW method performs well because the document vocabulary used in BOW is large, but the vocabulary used in the CONCEPTBASEDmethod may be too limited. The complexity of the representation method in CB-AUG overcomes the lim- itation faced by the CONCEPTBASEDmethod. It is observed that the graphs for CB-AUG, BOW, and CONCEPTBASEDfall as the number of recommendations, n increases. This behaviour is as expected because the earlier retrievals are more likely to be relevant. However, the overlap of CB- AUG and BOW at higher values of n may be because the documents retrieved by both methods are drawn from the same neighbourhoods. Further, the CB-AUG method may be better at ranking the relevant documents it retrieves. These results show that augmenting the representation of documents with a larger concept vocabulary, as done in CB-AUG, is a better way of employing the background knowledge for representing learning materials.

3.4. Evaluating Learning Resource Representation 53

Figure 3.8: Precision of the methods at an overlap threshold of 0.14

Figure 3.9 illustrates the precision of the methods at an overlap threshold of 0.25. The x-axis contains the number of recommendations n, while the y-axis contains the average precision at the different values of n. The relative performance at a threshold of 0.25 is similar to the performance at 0.14 however, this is a more challenging threshold for all the methods. Again, the performance of RANDOMat a threshold of 0.25 is consistent with the relationship between the threshold and the proportion of data. At this tougher threshold of 0.25, the average precision for RANDOMis about 0.05. Recall from Figure 3.7 that there were only 5% of document pairs with overlap scores ≥ 0.25. Hence the results of RANDOMare consistent with the threshold used for this dataset. There is an unexpected behaviour observed from CB-AUG and BOW at this more challenging threshold. Both methods do not perform well on the first retrieval, but improve at the second retrieval. For values of n from 2 to 10, the graphs for CB-AUG and BOW fall as the value of n increases, which is as expected. The performance of CONCEPTBASEDis as before with a gradual decrease as the number of recommendations increases. Again the limited vocabulary used in CONCEPTBASED limits its performance in this retrieval task.

Generally, the results show that the CONCEPTBASED augmented document representation method is able to identify relevant learning resources by highlighting the concepts they contain,

3.5. Summary 54

Figure 3.9: Precision of the methods at an overlap threshold of 0.25.

and this is important in e-Learning. The graphs show that employing a knowledge driven approach to support the representation of learning resources is useful for e-Learning recommendation.

3.5 Summary

Finding relevant learning materials to recommend to learners within e-Learning recommendation tasks can be challenging. This is because the learning materials are often unstructured text, and so are not easily indexed for retrieval. Hence the need for a suitable method of representing learning materials with the aim of improving recommendation. Furthermore, the vocabulary used in learning materials by domain experts is usually different from the vocabulary used by learners when trying to find relevant materials. The mismatch in vocabulary presents a semantic gap.

A step is taken to bridge the semantic gap by creating a method that automatically creates custom background knowledge in the form of a set of rich concepts related to the selected learning domain. The domain-specific background knowledge is created by exploiting a structured collec- tion of teaching materials as a guide for identifying important learning concepts. The identified concepts are enriched with descriptive text from an encyclopedia source. Discovered text from the

3.5. Summary 55

encyclopedia forms a pseudo-document for each concept. These pseudo-documents are used to extend the coverage and richness of the representation. So, each concept is made up of a concept label and an associated pseudo-document. The concept-space consists of the vocabulary from the concepts which is employed for document representation.

The developed background knowledge captures both key topics highlighted by the e-Book TOCs that are useful for teaching, and additional vocabulary related to these topics. So, the concept space provides a vocabulary and focus that is based on teaching materials with provenance. The CONCEPTBASEDdocument representation method takes advantage of similar distributions in the concept and document spaces to define a concept term driven representation. CONCEPTBASED focuses on the concept space, by using only the concept vocabulary, however this vocabulary is from a limited number of concepts, so it is too restricted for concept-based distinctiveness.

The CONCEPTBASED augmented document representation method exploits differences between distributions of document terms in the concept and document spaces, in order to boost the influence of terms that are distinctive in a few concepts. The evaluation results confirm that augmenting the representation of learning resources with a knowledge-rich representation as done in CB-AUG improves e-Learning recommendation. The larger vocabulary from both concepts and documents has been focused by the use of the vocabulary from the concept space.

Chapter 4

Enhanced Representation

A suitable representation for an e-Learning domain should have a good coverage of relevant topics from the domain. This would allow for an approach that caters for recommendations that meet learners’ queries which can be varied. One issue highlighted in the results from Chapter 3 was that the concept generation method produced a few concepts resulting in a limited concept vocabulary. In this chapter, the challenge associated with the representation of learning materials is further explored. The concept generation method used in the previous chapter is enhanced to improve our background knowledge and increase the coverage of the concept vocabulary. An enhanced method for representing documents is developed. In addition, the performance of the developed method on a larger dataset is examined.

4.1 Enriching the Domain Concepts

Domain concepts are potentially very useful for representing learning resources, because they contain important topics that describe a domain. The advantage in using domain concepts for representing learning resources is that the concept vocabulary allows the retrieval of the represented resources to focus on the domain concepts contained in learning resources, and this is useful for e-Learning recommendation. In this section, the method used previously for generating domain concepts is refined to address the issue of a limited concept vocabulary. The generation of a larger concept vocabulary which provides a better coverage of the learning domain is explored with the aim of creating a richer knowledge source that can be employed for tasks such as the representation of learning resources.

4.1. Enriching the Domain Concepts 57

In document Knowledge driven approaches to e-learning recommendation. (Page 61-67)