• No results found

6.3 Web mining agents

7.1.3 The agent paradigm

The Internet (and in particular the world wide web) is probably one of the most unpredictable, dynamic and complex environments for application development. This makes it a prime envi-ronment (if not the most important one) for the application of the agent paradigm [73], as agent

systems are well suited to complex or hostile environments [48].

Intelligent agents (and in particular information agents2) are also most appropriate for learning about their users and about each other. As was discussed in the previous chapter, intelligent agents have some unique properties that distinguish them from other software systems. Proper-ties that are of particular importance in this context, are that agents can act autonomously in an designated environment and that agents form part of a community.

The property of agents to work autonomously make them an excellent choice for observing and learning about user behaviour and adapting to changes in the environment. Many agent systems have been developed with the specific goal of learning user behaviours. The community property of agents also make them an excellent choice for developing systems that attempt to facilitate collaboration between groups of people (or other agents).

Taking the above mentioned into account, Agents that are deployed on the Internet3 can be seen as software entities that autonomously interact with the Internet and with other agents on the Internet in order to achieve some goal for its user (a typical goal would be for the agent to retrieve and filter information for its user). In this respect, multiple users could each have an agent to assist them with interaction with the Internet, giving rise to the concept of a multi-agent system.

Multi-agent systems

A multi-agent system can be seen as an ensemble of autonomous agents, acting and working independently from each other, each representing an independent point of control of the system as a whole [73]. In a typical multi-agent system, each agent will act on its own to accomplish its task(s). To accomplish this, the agent will need to interact with other agents in the environment and, of course, the environment itself. The key benefit of a multi-agent architecture is that the system as whole does, in many cases, achieve a more complex and wider set of goals than the sum of the individual agents’ goals. This implies that there is a type of synergism between the individual agents of a multi-agent system.

2see Chapter 6 page 79

3also called internet agents

This idea of individual agents interacting and working together to achieve a common goal gives rise to the idea of a society of agents. With this in mind, Zambonelli, Jennings, Omicini and Wooldridge notes that each individual agent in a multi-agent system can be viewed from two differing viewpoints [73]:

• Intra-agent viewpoint. Each individual agent in the multi-agent system is viewed as an individual software system with its own purpose/structure/technology.

• Inter-agent viewpoint. Each individual agent in the system is viewed as part of a society and interacts with other individuals, accesses resources in the environment and exploits the social infrastructure imposed by collaboration with other agents.

Another important aspect of a multi-agent system is the organization of the individual agents in the system. The organizational relationships between agents define the specific role an agent plays in the society. This role then motivates and structures the interactions among individual agents [73]. Obviously, for the agents to be able to interact there must be some way for them to contact and communicate with each other.

To address this problem, the concept of an agent control language (ACL) becomes important.

An ACL provides software agents with a means to exchange information and knowledge among themselves [74].

In their paper, Labrou, Finin and Peng discuss two languages that were designed to facilitate the exchange of information and knowledge4between agents, namely the knowledge query and manipulation language (KQML) and the foundation for intelligent physical agents-agent control language (FIPA-ACL) [74]. Another technology that could assist in communication between agents is the simple object access protocol (SOAP) which is discussed below. The common object request broker architecture (CORBA) is also an important technology and how it could assist in implementing communication between software agents is also discussed below.

4Knowledge representation was discussed in Chapter 4.

The knowledge query and manipulation language (KQML)

The knowledge query and manipulation language (KQML) is a language that is designed to sup-port interactions among intelligent software agents. More specifically, KQML is a language and set of protocols that support computer programs in the identification, connection and exchange of information with other programs [75].

KQML is primarily concerned with pragmatics and secondly with semantics. Pragmatics among computer processes can include:

• Identification of parties to initiate communication with and how to locate them.

• Knowing how to initiate and maintain an exchange of information between parties.

KQML is designed to support a wide variety of interesting agent architectures. In order to achieve this, KQML agents provide meta-data in the form of performatives which describes the agents information requirements and capabilities. A special class of agents called communication facil-itatorsis also defined to perform various useful communication services (e.g. registry service for services, message forwarding to named services, content-based message routing, grouping of in-formation providers and clients, and providing mediation and translation services [75]). In short, facilitator agents assist other agents in locating appropriate clients and servers for information exchange.

One of the most powerful features of KQML is that it allows for any information server to be treated like a knowledge based system. This allows for the incorporation of DBMS, hypertext systems, server orientated software (e.g. finger demons, mail servers, web servers, etc.) [76].

Agent systems based on KQML can be seen as having two virtual knowledge bases: the first representing the agent’s information store (i.e beliefs) and the second representing the agent’s intentions (i.e. goals).

Software agents using KQML transmit their message content (represented in any language of the agent’s choice) wrapped inside a KQML message [75]. The set of performatives mentioned earlier forms the core of the KQML language and a KQML message conceptually consists of a performative, associated arguments which include the real content of the message, and a set

of optional arguments that provides meta-data about the message (e.g properties of the content, information about the sender and receiver etc.)

There is a predefined set of reserved performatives but it is not required of an agent to implement all of them [75]. It is however required that if a reserved performative is implemented, it is implemented in a standard way. Finin, Fritzson, McKay and McEntire give a discussion of an agent architecture based on KQML [75].

The foundation for intelligent physical agents-agent control language (FIPA-ACL)

The foundation for intelligent physical agents (FIPA) is a standards organization in the area of software agents. FIPA has a number of technical committees, each of which is responsible for the production, maintenance and updating of specifications. There are three such technical committees that make up the backbone of the FIPA specifications [74]:

• The FIPA-ACL committee is charged with the production of a specification for an ACL.

• The Agent management committee is concerned with agent services such as facilitation, registration and agent platforms.

• The Agent/Software interaction committee covers the interaction of agents with legacy applications.

The activities of the FIPA-ACL committee is most relevant in the context of the discussion ren-dered here, and a brief discussion of FIPA’s agent communication language will follow.

The FIPA-ACL, like KQML, is based on the idea that messages are actions or communicative acts that are intended to perform some action when sent. The FIPA-ACL specification consists of a set of message types as well as a description of their pragmatics. The FIPA-ACL is superficially similar to KQML as its syntax is almost identical to KQML’s with the exception of different names for certain reserved primitives5[74].

5Communication primitives are called performatives in KQML.

FIPA’s ACL also inherits the idea of separating messaging language (i.e. the outer language) from the content description language (i.e. the inner language). The outer language describes the intended meaning of the message and the inner language denotes the expression to which the agent’s beliefs, desires, and intentions (as described by the meaning of the communication primitive) apply [74].

In the FIPA-ACL, communication primitives are called communicative acts (CA’s). These CA’s are the same kind of entity as KQML performatives and can be seen as equal. The FIPA-ACL, like KQML, makes no commitment to any particular content language. This is true for most cases. Receiving agents that process FIPA-ACL primitives need to understand the semantic language(SL).

The SL is a formal language used to define FIPA-ACL semantics. The SL is a representation language based on logic and can represent propositions, objects and actions. Each CA in the FIPA-ACL is defined as sets of SL formulae. These formulae describe an action’s feasibility preconditions(FP) and its rational effect (RE) [74].

The feasibility preconditions for a given CA describes necessary conditions that must be met by the sender. In other words, for an agent to properly perform the CA by sending a particular message, the FP must evaluate to true for the sender. The agent is not obliged to perform the CA if the FP is true, but it can if it so chooses. A CA’s RE represents the effect that an agent can expect to occur as a result of performing the action specified by the CA and can also specify conditions that should hold true of the recipient. The receiving agent is not obliged by the standard to ensure that the expected effect is achieved. With this in mind, an agent can use its knowledge about the RE to plan what CA to invoke but it cannot ensure that the RE will necessarily follow.

From the above, it can be said that KQML and the FIPA-ACL are almost identical with regards to the basic concepts and principles they observe. The languages also have the same syntax.

A KQML message and a FIPA-ACL message looks syntactically identical (save for different communication primitives names). The major difference between them are semantic differences, which makes a complete and accurate translation between the two languages generally impossi-ble [74].

The simple object access protocol (SOAP)

SOAP is a lightweight protocol for exchange of information in a decentralized, distributed envi-ronment [77]. In short, SOAP is a protocol for packaging information in XML and sending it via HTTP across the Internet.

SOAP consists of three parts [77]:

• The SOAP envelope construct defines an overall framework for expressing the content of a message, who should deal with it, and whether it is optional or mandatory.

• The SOAP encoding rules defines a serialization mechanism that can be used to exchange instances of application-defined data-types.

• The SOAP RPC representation defines a convention that can be used to represent remote procedure calls and responses.

SOAP messages are fundamentally one-way transmissions from a sender to a receiver. It is also possible to combine messages to produce a request/response behaviour. SOAP messages are then routed along a so-called “message path”. This allows for processing of the message at multiple intermediate nodes (if required) in addition to the destination node.

Although SOAP is not specifically designed as an ACL, it would be possible for agents to use it to communicate with each other. Agents could express their needs in an XML format and these needs could then be encoded into XML. These needs could then be exchanged among agents using SOAP to facilitate the sharing of information and knowledge among agents in a domain.

There has been suggestions that ACL messages should be encoded in XML in their entirety [78].

That is the messages and their content should be in XML. Using XML as an encoding scheme also makes the ACL much more WWW-friendly [78]. Because of SOAP’s ability to package information in XML, it could be conceptually possible to package an XML based ACL message inside a SOAP message and then use SOAP to do the transportation work.

The common object request broker architecture (CORBA)

CORBA is a standards specification for the handling of communications between distributed objects [79]. The CORBA standards are defined by the object management group (OMG) and aims to be an open, vendor-independent architecture and infrastructure. The CORBA object model considers applications that are made up of objects that encapsulate some set of attributes and services. CORBA objects also have an additional interface that defines which of these at-tributes and operations are available to other applications. This interface is defined by a standard, language-independent interface definition language (IDL).

The model also defines a object request broker (ORB) that is competent about objects requesting services and their interfaces. The power of CORBA is the separation of interface from implemen-tation. CORBA allows for objects to communicate through a strictly defined interface (through IDL) regardless of the object’s underlying implementation. For more detail on CORBA, inter-ested readers can refer to work done by Sommerville and the OMG [79, 80].

Significance of CORBA for ACLs

The significance of CORBA for multi-agent systems is that it can be used as a transportation mechanism for KQML or the FIPA-ACL, thereby enabling the agents to communicate with each other.

CORBA could provide a distributed uniform space wherein agents could interact [76]. The lo-cation of other agents in the environment could be handled by requests to the ORB. This would enable agents to co-operate with other agents connected to their local brokers regardless of their location in the distributed environment. This is especially relevant for multi-agent systems de-ployed on the Internet [76].

7.1.4 Web content mining

The web mining process was discussed in a previous chapter and could be briefly summarized as having four steps: information retrieval, information extraction, generalization and analysis.

The application of these steps could have great benefit for personalized information agents on the web.

It is generally too expensive for web search engines to do an extensive analysis of web pages.

Search engines therefore only consider a part of the site relevant to its indexing task. For a personalized content mining agent, this is obviously not ideal. By using web mining techniques, the agent could discover more detailed information about the website and potentially cover more of the website than a search engine that has to handle millions of daily queries [43, 45].

Another benefit is that the content of the website could, in the case of a personalized agent, be mined for context specific information related to the profile of the agent’s user. This enables the agent to specifically mine for context sensitive information in a website.

An information agent could apply web content mining techniques to perform post-retrieval anal-ysis on web pages to improve the quality of returned results and provide its user with an improved searching experience.