• No results found

7.2.1 What is a meta-search engine ?

Web search engines do not offer equal coverage of the world wide web, as was discussed in a previous chapter. In addition to this there are a multitude of search engines available, each offer-ing a different strategy and a different interface for searchoffer-ing the web. To receive the broadest coverage and get more relevant results for a given query, users could use multiple search engines.

This can be problematic however, as the overhead involved in collating and merging multiple re-sults can become quite a monolithic task for the user. Moreover, with the vast amounts of search

engines available, it can also be difficult for a user to keep track of which search engines are better for which queries [81, 82, 83].

In order to leverage the coverage of multiple search engines and to provide a single interface to a host of search engines, the concept of a meta-search engine was born. A meta-search engine is a computer program that uses multiple search engines (usually in parallel) to process user queries.

The process is briefly summarized in figure 7.1 on page 107.

Meta-search

Figure 7.1: Meta-search engine process adapted from [81, 82, 83].

The obvious benefits are that the meta-search engine’s coverage is improved through the use of different search engines and that the user is presented with a single interface through which he/she can search the web. Another, less obvious, benefit is that a meta-search engine can be programmed to “know” a lot more about a search engine than a casual human searcher. Many search engines have special features and optimizations that meta-search engines could exploit and take advantage of, thereby improving the results returned from the various search engines [81, 82, 83].

Meta-searching the web has obvious benefits for a personalized autonomous web content mining

agent as well [81, 82, 83]. These benefits will be discussed in the next subsection.

7.2.2 Improving web mining agent coverage through meta-searching

As was discussed in the previous chapter, the first step in the web mining process is information retrieval/resource discovery. Many web mining systems approach this step manually by requiring the user to input start URLs from which resource discovery can commence. Unfortunately, users typically do not know the locations of interesting websites and may have difficulty in supplying these start locations. Furthermore, users want results to be returned to them as quickly as possible. This could have a serious impact for systems that manually attempt to discover resources as it could take an information agent a considerable amount of time to do so, thereby slowing down the entire search effort.

Using meta-search techniques is extremely desirable for personalized web content mining agents.

Using collated results from various search engines as a starting point for further information retrieval, will enable the user of such an agent to utilize the meta-search agent from the first use as it could immediately return relevant or near-relevant results. A personalized agent will need time to build up a user profile of its user and, as relevant data is collected and the agent learns about its user, the performance of such a personalized meta-search agent will steadily improve to include user context in the processing of queries. In the worst case, the coverage of the agent and the results returned will be as good as other meta-search facilities.

7.2.3 Collaborative ranking of multiple search engine results

The primary method of information retrieval on the current web is undeniably web search en-gines. With this fact in mind, it can be hypothesized that these search facilities are used by many different users, each with different interests and goals. There may, however, be users with overlapping or similar interests. This re-introduces the idea of a community of searchers as was discussed earlier in this chapter.

The current “one-size fits all” approach to web searching does not really facilitate collaboration

among users. There could be a number of reasons for this: Search engines may not have the resources to handle the cost of collaboration among users; Receiving implicit feedback from users is difficult with the current model as contact with the user is lost as soon as he/she leaves the search engine’s results page; Lastly, a central server-based search engine may want to be usable and effective for a large number of people and search results that are biased in any way may not be ideal.

This idea of a community of people, some with differing interests and some with similar interests, all using different search engines to satisfy some information need, is one of the key issues that can be addressed by a collaborative meta-search agent system. By using collaborative filtering techniques, the agents could facilitate collaboration between users of multiple search engines.

This collaboration could lead to multiple agent users forming search groups, rating and reviewing different search engine results in respect to their queries. A benefit gained from such a system would be that the results received from multiple search engines could then be ranked according to user interests and community interest, thereby potentially producing better context-sensitive results.

As has been the focus of discussion up to this point, it can be concluded that collaborative filter-ing, agent based software and meta-search techniques could have the potential to greatly improve the accuracy and usability of web search results. The question now arises to how a system that incorporates these techniques could be designed. This question is addressed in the following sections.