5.3 Oine experiment: improving IR methods for teaching
5.3.2 Subject IR methods
During this experiment, we evaluate both structured and unstructured IR methods. Briey, structured scoring methods analyse web pages as a structured object, examin- ing the dierent parts of the web pages individually. Instead, unstructured methods consider web pages as a unique text, without distinguishing the dierent components that form the web pages. For each of those two groups, we have selected one scor- ing method, which is, in our understanding, the most representative in their respective groups: BM25F (Pérez-Agüera et al., 2010) and TFIDF (Ramos, 2003). The TFIDF method does not divide the web page into dierent parts, but it only analyses the body- text as a whole. When scoring documents for a non-binary query, a popular way to use TFIDF is based on a Vector Space Model (VSM) representation of both the query-text and the document (Ramos et al., 2003). So, we build two vectors of TF-IDF scores from the terms in the query: one for the body-text of the web page and the other for the query-text. The dimension of the vectors is equal to the number of unique terms in the query. Given a term t of a query q, and a text the_text, we compute the TF-IDF scores as follows:
TF-IDF(t, the_text) =√frequency(t ∈ the_text) · idf(t)2, (5.2)
where idf(t) is dened as follows:
idf (t) = 1 + logtotal number of documents
docF req(t) + 1 .
This implementation of TF-IDF comes from the TFIDFSimilarity class of Apache Lucene7, after removing the normalisation and boosting factors. Finally, to limit the
RMSE measurement of the performance to the range [0, 1], we prefer to directly use the cosine similarity value of the two vectors. Hence, the relevance score of a web page w for a query q is:
TF-IDF-SCORE(q, w) = cosine_similarity(Vq, Vw),
where Vqand Vware the TF-IDF vectors of the query q and the web page w respectively.
The BM25F method is a structured scoring method which analyses the relevance of a web page according to the relevance of dierent parts of the web page to the query. The formula implemented and used in this experiment is the following (Pérez-Agüera et al., 2010).
BM25F(q, item) = ∑
t∈item
tf (t, item)
k1+ tf (t, item) · idf(t),
where tf(t, item) is the linear combination of the frequency of the term t in the four sections of the item, computed as follows:
tf (t, item) = ∑
s∈item
ws· tfs(t, item),
where ws is the boost factor for the section s of the web page. Finally, tfs(t, item) is
the term frequency of the term t in the section s of item. Such function is the following:
tfs(t, item) = frequencys
(t, item) 1 + bs(litem,sls − 1)
,
where frequencys(t, item) is the frequency of the term t in the section s of item, litem,s
is the length of the section s of item (expressed by the number of words of the text of
s), ls is the average length of the section s according to the web pages in our collection.
Furthermore, bs, for a section s, is a parameter of BM25F for normalisation purposes.
Section 6.2 in Chapter 6 conducts an in-depth discussion of which parts of web pages shall be considered for scoring web pages for teaching purposes. We here anticipate that four sections are analysed, so s ∈ {title, body, links, highlights}. We base the parameters of BM25F used for this experiment, and other tests conducted later in our research, on methods proposed for BM25F Pérez-Agüera et al. (2010). In the context of such method Pérez-Agüera et al. (2010), we nd that btitle = 0.4, bbody = 0.3, blinks = 0.4. For bhighlights we have assigned a value of 0.5 because it is a section which is expected to represent fundamental concepts for the content of the web
page. We set the value of K1 = 1.7 as per the original reference Pérez-Agüera et al.
(2010). The optimisation of the boost factors may lead to better results of the method, but, unfortunately, we do not have enough web searches for such purpose. Even other researchers, with a much larger dataset, could not optimise the parameters (Pérez- Agüera et al., 2010). However, we suggest two possible congurations of these factors: i) following the literature (Pérez-Agüera et al., 2010), ii) summing, for each section, the weights of the attributes of the Teaching Context as formulated for WebEduRank (refer to Table 6.2 in Section 6.3.1). In the last case, we add 1 to the sum of weights to keep the boost factors over 1. We can propose a third possible conguration based on summing the elements of the Teaching Context which show improvement of the accuracy of BM25F. We can only explore this last option at the end of this experiment when we can identify the attributes of the Teaching Context that truly improve BM25F. For comparing the BM25F with the two possible settings, we run BM25F with two congurations of the following boost factors:
• Option 1 : wtitle = 3, wbody = 1, blinks = 2, whighlights = 3. The section high-
lights is usually not exploited in the BM25F method (Pérez-Agüera et al., 2010), so we assign the value 3 because it expresses signicant content addressed in the web page, similarly to the section title.
• Option 2 : wtitle = 4.1, wbody = 3.8, blinks = 1.7, whighlights = 1.9.
We run with these settings for both options for the IP-informed BM25F, while for the plain BM25F we use Option 1 only since we do not nd an explicit link of the user-query to any of the attributes of the Teaching Context.