• No results found

6.3 Web mining agents

6.3.6 WebAce

WebAce monitors user browsing behaviour and suggests new pages that may be of interest to them [68]. Browsed pages are classified via clustering and search queries generated. Pages that

6Term Frequency - Inverse Document Frequency: An algorithm for the assignment of weights to terms in a document set. The algorithm is biased to assign higher weights to the most discriminating terms.

7Classic decision tree learning algorithm.

are similar to pages previously browsed by the user are suggested. PCDP8and an association rule discovery method are compared to autoclass and hierarchical agglomeration clustering (HAC).

6.4 Conclusion

In this chapter the fields of intelligent agents and web mining were introduced. The combination of the two fields, web mining agents, has direct applications in web information retrieval and personalized web browsing.

Intelligent agents are by no means a new technology and have been evolving, in one form or another, over the last 50 years. The key properties of agent technology that disseminate them from other software paradigms are their ability to function autonomously or semi-autonomously and that they can form part of a larger community of agents. Intelligent agents may have differing goals and tasks depending on their types. A brief discussion of how agents are typed according to their intelligence and agency were given as well as the main categories agent systems are classified in. The agent paradigm has also been successfully applied to a variety of application domains.

The field of web mining is concerned with the automatic discovery of interesting patterns on the WWW. In this chapter, three categories of web mining was identified namely: web structure mining, web usage mining and web content mining. Each category differing in two respects, the actual data being mined as well as the motivation for mining it. In this chapter it was also highlighted that, regardless of the category of mining, web mining has a number of core subtasks involved in it. These subtasks are resource discovery, information selection/extraction, gener-alization and analysis. Each subtask has differing goals and techniques attached to it. Finally a very brief review of some current web mining systems were given, the goals of the system summarized and the techniques employed by each system mentioned.

The application of new paradigms like intelligent agents and web mining to the domain of the web can be both a fascinating and fruitful field of research. Search engines are progressively

8Principle Component Divisive Partitioning: Top-Down clustering method that splits a training set until suffi-ciently small clusters have been formed. A binary tree then holds the clusters.

losing the battle of indexing resources on the web and new techniques are needed for finding and extracting useful information from it. In the following chapters, the focus of discussion will shift to the development of a prototype web mining system.

Chapter 7

Design of a prototype collaborative personalized meta-search agent

(COPEMSA) system model for the world wide web

“Things should be made as simple as possible, but not any simpler.”

-Albert Einstein

In this chapter, the focus of discussion will be the prototype design of a collaborative personalized meta-search agent (COPEMSA) system for the world wide web. The discussion is opened with the question why there is a need for personalization of the web environment.

The following section concentrates on a natural extension of this idea: the concept of search in a community. The possible benefits for considering communal interests for resource discovery on the web is also discussed.

The following section introduces the idea of meta-searching and its application to the world wide web. Next follows the design goals of the proposed system and an explanation of the two-tier approach to search the proposed system aims to achieve. The core components of the COPEMSA

system architecture are discussed next. In conclusion to this chapter, some final thoughts on the work presented is given.

7.1 Agent-based personalized autonomous web mining

The astonishing growth of the Internet has had an impact on every aspect of society. One of these aspects is the way in which our society retrieves and interacts with information. The web has contributed much to the accessibility of information to a wide variety of interested users.

However, the models for locating and retrieving this information on the web have not kept pace with users’ personal preferences and browsing behaviours [45].

The current model for searching the web can be described as a “one size fits all” approach where all user queries are treated in the same way regardless of their preferences, interests or browsing behaviours. As was mentioned in a previous chapter1, one of the main challenges facing web information retrieval is determining the context in which a particular user’s query was made.

This could potentially improve the relevance of the returned results to the user and increase the usefulness of the service overall [45].

Another important issue inherent to the web domain is its vastness and scope. Even if users find relevant links through the use of a search engine, the task of visiting the site and deciding if the information is useful or not must still be done manually. This could be a highly time intensive process and it is quite possible that the task of locating information becomes greater than actually using the information for whatever purpose it was intended.

To address these issues, intelligent tools that attempt to automate and personalize the tasks of searching, retrieving and filtering of information on the web are needed. The problem of con-structing such personalized autonomous web mining systems remains, to a large extent, an open one [69].

There are a number of factors involved in the design of such a system. These factors are discussed because of their importance in the design of the COPEMSA system model. The remainder of

1see chapter 5 page 67.

this section will offer a discussion on some of these key factors involved.