• No results found

documents to the website.

The advantage of the IndexFinder algorithm is that it generates index documents that reflect the way users view the information available in the website. In addition, it also improves users’ navigation in the website. However, the algorithm’s performance must be judged against a limited range of training and test data. The data used for training and testing are drawn from a small group of websites comprising of structured documents with distinguishable attributes, whereas the majority of websites on the Web consist of documents with highly unstructured content. Nevertheless, IndexFinder demonstrated the transformation that can be performed on a website’s presentation by learning from user access patterns.

2.5 Chapter Summary

In this chapter, we have surveyed the broad area of Web Mining with a specific focus on web usage mining. The fundamental aspects of web usage mining such as the source of data and the challenges involved in extracting knowledge from web logs were explored in detail. Existing usage mining techniques and their potential applications to the Web were also discussed. Here, we have identified the advantages that could be gained from mining web logs. We also highlighted the limited types of information that could be obtained from web logs is a significant drawback. However, this can be overcome by using additional information from website content and structure. We have also briefly reviewed the two other forms of web mining, which are content and structure mining, highlighting their data sources as well as applications to the web. Although the review done on content and structure mining is not extensive, it is still possible to concur that the substantial work involved in them could amount to a mini project on their own. Therefore we have decided that our work will focus entirely on usage mining to obtain knowledge about the users and the website from web logs.

Following that, this chapter looked at the evolutions in Web technology from static to the current intelligent websites. We have seen techniques that use automated user

profiling to perform website personalisation and those that rely solely on web logs rather than users’ profiles to personalise a website. However, we realised that majority of the current intelligent websites were too focussed on modelling the users to satisfy their Web needs and fail to address the actual source of the problem which, is design related. Adap-tive website can be seen as an optimization problem, where a website continually tries to maximise the efficiency with which users can browse and navigate through it. The difference between current intelligent websites and adaptive websites is that the former filters website information to be presented to a user, while the latter actually transforms it (i.e., alters the website based on users’ browsing behaviour so as to reflect their changing needs), thus improving access to information. Nevertheless, the development of adaptive websites capable of transforming their design is still in its infancy. This is mainly due to concerns such as privacy and data loss.

Concern about privacy is one of the major factors affecting the development of adaptive websites. Many users have voiced their unease over the collection, use and distribution of their personal information (e.g., browsing behaviour) online as it intrudes into their privacy. Furthemore, some websites are known to employ unethical methods to gather personal information about users without their knowledge. This has caused users to be suspicious of the role of an adaptive website and abstain from such websites. Similarly, organisations rendering services to users online are skeptical about the true potential of a transformation based adaptive website. They fear that these websites may cause more harm and chaos rather than benefiting them by making detrimental changes to their website’s design resulting in irrecoverable data loss.

Despite these uncertainties, a number of studies have demonstrated the feasibility of transformation-based adaptive websites. These studies as outlined above include promo-tion and demopromo-tion (Perkowitz and Etzioni, 1997), linking or otherwise known as short-cutting (Srikant and Yang, 2001; Lee and Shiu, 2004; Brickell et al., 2007) and index page synthesis (Etzioni and Perkowitz, 2000) which, demonstrated the potential benefit that could be gained from such intelligent website systems. The main advantage of the transformation approach is that it focusses on altering a website’s design for the benefit

2.5. Chapter Summary 41 of everyone rather than an individual.

The motivation for our work is derived from the transformation approach i.e., modelling the website for all users. Our work aspires to suggest changes to the school’s website organisation based on the collective users access patterns obtained from the school’s web logs. In order to achieve this, we chose to address the research problem of finding useful shortcuts that could be used to improve the task of navigating through the school’s website for all users. By useful shortcut, we mean a new link between two documents that could shorten the navigational paths taken by users to reach their document of interest.

Our Approach To Adaptive Websites Using Wayposts

The objective of this chapter is to introduce our approach to an adaptive website and to provide details of the work carried out in acquiring data from the web logs. In particular, we will highlight a research gap found in the area of adaptive websites and suggest our approach to address it. Following that, the chapter provides a discussion on the acquisition of data from web logs which is divided into three tasks, namely Web Logs Filtration, Crawler Logs Elimination and Sessionisation. The chapter gives an account of the nature of each task and elaborates on the methods employed to carry them out. In order to establish our discussion, a detailed description of the web logs which will be used consistently through this and later chapters for evaluation is provided.

The chapter is organised as follows. Section 3.1 lists the research gap identified and introduces our approach to addressing it. Section 3.2 gives an introduction to the web logs that will be used in our experiments. Section 3.3 provides a brief description about the three tasks of data acquisition with sections 3.3.1, 3.3.2 and 3.3.3 giving a detailed account of the processes involved in each of them. Finally section 3.4 summarises the chapter.

42