• No results found

4 Using WebNaut’s Functionality to Support Prefetching

The main question that arises and constitutes our main point of interest in this section is how the WebNaut assistant is able to support prefetching and caching techniques in order for the client to perceive the least possible latency when downloading web documents. The cacheability problem is of high interest only in the case that the user clicks on the special tool to see the learning agent’s recommendations and responds with the appropriate feedback to each of them. In our attempt to obtain a clear view on prefetchability, we examine two different aspects of the problem that are analyzed in the following subsections.

4.1 Prefetching of Web Documents Recommended by the Learning Agent As mentioned above, the learning agent presents a list of URLs to the user and then waits for the user’s feedback. The system knows that the user’s next step, after load-

58 George Kastaniotis et al.

ing the LA, is to visit the web documents corresponding to the URLs of the list. Thus, prefetching them on the user’s behalf is a matter of substance.

Fig. 3. Learning agent’s recommendations

Utilizing a dummy process. One first approach to prefetching the LA recommen- dations is to use a dummy process, even from the initial stage of their similarity evaluation to the user’s profile. This process could save all documents with the highest score to a cache folder on the local hard disk. Instead of downloading them, when visiting the corresponding sites, the system could redirect the request to that folder.

A solution such as the one above would be too easy to illustrate, but is not devoid of serious disadvantages. First and foremost, the period of time between saving the documents onto the local disk and presenting the list of the corresponding URLs may be too long so as to consider them as stale. Thus, while bringing back the local copies, WebNaut must first check for new versions in the remote server. Furthermore, due to a not very representative user profile, the user may decide not to visit a site, consider- ing it as irrelevant beforehand. This means that local resources are wasted for useless documents, which a user may select not to download at all.

For the reasons stated above, the idea of utilizing the dummy process described must be abandoned. A more evolutionary technique is needed that will be able to

Intelligent Web Prefetching Based upon User Profiles – The WebNaut Case 59 utilize the learning agent’s recommendations and information, the user profile and the recommended web documents’ HTTP header in a better manner. The basic operation features of such a technique is extensively analyzed below.

Ideal prefetching algorithm for LA’s recommendations. The learning process used by LA is time-consuming and of high demand in user feedbacks aiming to build the most representative profile. This means that the process may include a large number of iterations, each one targeting to update the profile in order to bring it close to the user’s information interests. After each iteration, a new list of URLs is provided by LA, while waiting for a feedback. the arrival of the feedback triggers the commencement of a new iteration. WebNaut’s knowledge about the listed URLs is limited to the following data items:

Each URL, i.e. the remote server and the special path on its disk that will lead to the folder where the corresponding web document resides.

The score of each web document in relation with the current user profile.

The exact query that resulted in each URL. This consists of a set of keywords and a combination of logical operators.

The user’s feedback, which represents the degree of relevance to the personal in- formation interests.

The data items enumerated above can be used as an input for an intelligent pre- fetching algorithm during each iteration of the LA’s learning process. During the next iteration, the algorithm will be able to decide which documents to prefetch. The key idea is to maintain a list of keywords of the queries resulting in documents that the user bookmarks as ‘very interesting’ or ‘interesting’. In particular, the list will be a subset of the user profile that contains all the keywords connected in the queries with the ‘AND’ operator. Because of the large weight factor of the logical ‘AND’, words connected with it are closer to the user’s interests than others.

The basic operation features of the proposed ideal prefetching algorithm are as fol- lows (see Figure 4): During the commencement of the learning process the list is empty. When the first results are delivered to the client, the algorithm waits for the user’s feedback. For those URLs that client responds with a positive bookmark, the keywords of the relative query will be added to the list. In the next iteration of the learning process, the list will form the base for the evaluation of recommendations to be prefetched on the client’s behalf. The algorithm will continue to update the list in the same way at each iteration.

The algorithm must prevent the list from growing without control and must also ensure that it will faithfully follow the client’s interests. This can be achieved by hold- ing metrics for each keyword in the list, which represent its current weight in the prefetching task. The client’s feedback will keep these metrics informed and if their value falls to a lower bound, the keyword will be expelled from the list. A measure that can be taken against the out-of-control expansion of the list is the use of an aging factor. Each time the client responds with a negative feedback, the aging factor of all keywords participating in the query and residing in the list must be reduced. In cases of a positive feedback, an increase must occur. When the aging factor of a keyword reaches a predefined lower level, the keywords must be expelled from the list. Taking into account that other keywords in the list may be in close relationship with the ex- pelled one when forming queries, multiple expulsions are possible.

60 George Kastaniotis et al.

Finally, another factor that must be taken into consideration by the algorithm is the web caching hints provided by HTTP 1.1 headers. This is to ensure that no web document stored in the local cache is to become out-of-date. This factor is crucial for deciding which web pages to store even from the phase of the evaluation with the client’s profiles and which ones to prefetch at the phase of the presentation of the learning agent’s recommendations.

4.2 Prefetching Based on HTML Anchors

Apart from just prefetching LA’s recommendations another matter of interest is pre- fetching web pages linked to the recommended ones and having the same or related content to them. According to [3] the likelihood of linked web pages to have similar content is high. Moreover, titles, descriptions and anchor text represent at least part of the target page. Keeping this in mind, we could modify WebNaut’s learning agent to focus around or in HTML anchors of the web documents recommended by LA. Find- ing there a rich collection of profile keywords is a good reason for prefetching the target web pages.

Prefetching based on HTML anchors should be triggered at the time a client visits a web document of the LA’s recommendation list. While loading this document to the browser, WebNaut could scrutinize the anchors in order to prefetch to the local cache pages they point at. Because clients tend to revisit previous pages to click on other hyperlinks, the recommended documents should also be stored in the local cache for a while.

The above prefetching scheme could be extended to the point that enables prefetch- ing support when clients surf through a sequence of web pages. This means that WebNaut may scrutinize anchors of target pages to prefetch new targets.

In an alternative approach, instead of using the overall profile for making decisions about anchor tags, the keyword list supporting the ideal algorithm described in the previous subsection could be used as well. This results in limiting the set of keywords and in further reducing the total number of web content to be prefetched. Conse- quently, the waste of local resources on web caching needs is minimized.