context sensitive virtual sales agents

(1)

paper

context sensitive virtual sales agents

K. Kamyab, F. Guerin, P. Goulev, E. Mamdani Intelligent and Interactive Systems Group, Imperial College, London

k.kamyab, f.guerin, p.goulev, [email protected]

Abstract

Current leading portals still do not provide the user with a realistic three-dimensional multi-modal interaction. The next generation of portals will not only amend this but will provide an integrated environment with marketplaces, chat rooms and believable synthetic characters. We are concerned with the design of embodied sales assistants to be immersed in MPEG4 multi-user worlds. Sales assistants that interact with customers and their virtual shop in a believable manner bring many challenges to the designer. In particular, the assistant must have a detailed and timely knowledge of its contextual setting. We propose a contextually sensitive architecture for sales assistants in the SoNG Project supporting integrated context sensitivity over four distinct domains: the social or communicative domain, the user domain, the product domain and the virtual environment. This architecture has the potential to provide situated believable sales assistants for 3D environments.

KKK

Keeeeyyyywwooowworrrrddddssss:::: Agents, Believability, Context, SoNG, MPEG-4

1 Introduction

We seek to create a three-dimensional portal to the web, which presents the user with a virtual world populated by embodied agents. Embodied agents play the roles of sales assistants in the shops of the virtual world. These agents need to appear believable, at a low level this can be achieved by using human-like gestures while at a higher level they are

required to have a detailed knowledge of the world in which they are situated. In particular the agents need to know the products available for sale and the expectations and preferences of the human shoppers. Knowledge of products available is easily achieved by means of product databases, but gathering and representing knowledge about human users presents a challenge. Additionally, when human users are presented with a shopping mall scenario which is familiar to them from the real world, they will naturally have certain expectations about the agents playing the role of shopkeeper. Therefore agents must be aware of their social role in the virtual world and the behaviour that is appropriate to that role.

Our aim is to employ user profiling techniques including the fuzzy modeling of user preferences to furnish the agents with a knowledge of the intended meaning of the individual human users’ linguistic expressions (for example cheap and light). Additionally we describe social context by using an explicit social state, which describes social roles and associated permissions and obligations for actions. In this way the agents are made aware of the social context in which they are situated and can behave in accordance with the expectations of a human user.

In section 2 we look at the requirements for our virtual sales agents, central among these requirements is context sensitivity. Section 3 looks in detail at the techniques that can be employed to achieve context sensitive agents, these include an awareness of social context and user modeling techniques. Section 4 shows the architecture used to implement a contextual embodied agent and finally we draw some conclusions in Section 5.

(2)

2 Requirements for Virtual Sales Agents

We are concerned with using embodied agents as sales assistants in a virtual marketplace. Within the virtual marketplace of SoNG† several demonstration applications shall be developed including a theatre booking system, a phone shop and a clothes shop. Here we focus on the telephone online shop. The sales agent must be able to converse with the customer via natural language, to respond to queries for phones satisfying certain criteria and to display phones to the customer.

This application requires not only that the agents must have the relevant product information, but also they must be able to engage the customer in an interesting conversation. In order to sustain the user’s interest the agent should be: 1. Believable. Believable agents are agents that create the

illusion of thinking, making decisions and acting autonomously. The agent must appear believable to the customer, using human-like behaviours, portraying an underlying emotion mechanism, as well as verbal communication skills. At a lower level it will be necessary to synchronise gestures and verbal communication [1].

2. Proactive. The agent must proactively introduce new topics of conversation, for example describing new products. By the appropriate use of gestures the agent will appear exciting and enthusiastic about products. 3. In character. A sales agent in a phone shop must have a

behaviour and an expertise consistent with the expectations of a human shopper. Although the agent must have a broad (and mostly shallow) [2] set of capabilities in order to satisfy a wide range of user demands, it must have a core expertise. Given that the agent’s main function is that of a shopkeeper, the agent should desire to satisfy the customer while maximizing sales. If the agent “knows” the user then it should try to tailor its service according to the user’s preferences. 4. Context Sensitive. The agent should be aware of its

surroundings and be able to communicate its understanding through verbal and non-verbal communication. Such an understanding of context will significantly impact the agent’s ability to fulfil the previous three requirements. Here we identify four domains in which the agent must have contextual understanding: the social or communicative domain, the user domain, the product domain and the virtual environment.

3 Context Sensitive Agents

The following subsections deal with our agent’s mechanisms for understanding context.

3.1 Interpreting Communicative Acts

Through an online interface users may type text in natural language and may select gestures for their avatar to execute in the virtual world. The sales agent must interpret these verbal and nonverbal communicative acts so that it may update its model of the world and respond appropriately. The interpretation of an act does not depend on the act alone, communication can be highly context sensitive, i.e. there are certain external factors, which affect the meaning of a speech act. These factors include the domain in which the conversation takes place, the status or authority of

participants and the relationship an act has to the remainder of the discourse. Here we handle context in two ways: 1. Firstly we have an explicit representation of the context

in the form of a conversation state, which holds information known to all participants in the conversation. The conversation state is a set of propositions (e.g. representing expressed mental attitudes) and variables important to the conversation (e.g. roles occupied by participants). In our communication model we define the meaning of communicative acts as a function from context onto context [3] i.e. the conversation state contributes to the meaning of acts.

2. Secondly we make use of protocols to specify roles of participants and to encode information appropriate to the current domain. There are social roles, which are permanently active, and also roles within conversation protocols, which last only for the duration of the conversation. The roles agents play in a protocol (or in the society) determine the social obligations of the agent in the conversation and also affect the meanings of acts.

Fig. 1 Shopkeeper-Customer Protocol

We define semantics for events in the virtual world firstly from a public perspective [4] (the meaning according to the conventions of the society) and secondly from a private

(3)

perspective (the inferences that an agent itself draws). The conventional (public) meanings of acts are predefined and common to all agents in the world. The public meanings are used to build the model of the conversation state. Thus there is a standard meaning for each communicative act and all participants can update their copy of the social state in the same way. Private inferences are agent specific and will update the agent’s own model of the world

We now look at a simple example of the interpretation of communicative acts to illustrate how the public meanings and protocols update the conversation state. In our phone shop scenario, the sales agent permanently occupies the social role of “shopkeeper”. Figure 1 shows a UML-style statechart diagram describing a protocol for an interaction between a sales agent (playing the role of Shopkeeper) and a human user (Customer). A user’s avatar entering the shop is an event that must be interpreted. The public meaning of this event defines that a new conversation state must be created and asserts two role variables and one proposition within that new state. The variables are: the agent that occupies the social role “shopkeeper” is now assigned the conversational role “shopkeeper” and the user’s avatar is assigned the role “customer”. The new proposition asserts the existence of a social obligation for the shopkeeper to greet the customer.

Subsequently the customer queries about the availability of WAP enabled phones. The public meaning of this query adds two new propositions to the conversation state. The first is that the customer has expressed a desire to know about the availability of WAP enabled phones and the second asserts the existence of a social obligation for the shopkeeper to tell the customer about WAP enabled phones. In addition to these public meanings the sales agent makes a private inference about the world it inhabits. These include propositions about the user or the 3D world.

3.2 Building a User Model

A user-modelling component is an essential part of an e-commerce system such as the SoNG application. Good sales assistants have the ability to make accurate assumptions about the user and use the information gathered to tailor their service. Furthermore, regular customers receive an even more personalized service, not only because of their tastes, but because the sales assistant knows how best to interact with them. For instance, a good restaurant waiter is one who knows which is your favourite table and knows what your favourite dish is. However, in order not to embarrass the customer, s/he should also pick up on some subtle clues. For example, if the customer doesn’t have much buying power then the waiter should not suggest the most expensive plate. If the customer is of Jewish or Muslim background, they may not wish to eat pork. More over, if the customer is visibly distressed, maybe they could do with a strong drink. In order to provide this quality of service, information must be gathered regarding the customer’s tastes, buying power, social status, cultural or religious background, personality and current emotional state.

Although the waiter scenario is not the one proposed in the SoNG project, it does provide some insight into the methods of service provision. There is a fine line to be struck between identifying and catering for user requirements and fulfilling the agent’s goal of maximizing sales.

The user model’s use is threefold: firstly to support the dialogue manager by modelling users’ beliefs and desires, secondly to model users’ preferences and finally to model users’ affective state. The appropriate use of this information can greatly improve the perception of intelligence and trust in the agent [5]. During an interaction, the agent will draw

inferences about the customer’s preferences and build up a user model for each customer. Using this, it can tailor its conversation to be appropriate to the customer’s interests.

User beliefs and desires: Modelling a user’s knowledge or beliefs and desires is a requirement for providing the user with an efficient conversational partner. These assumptions aid the system to converse more naturally, supply additional relevant information, avoid redundancy in answers and explanations and detect the user’s wrong beliefs about the domain. [6].

User preferences: In order to tailor the service to the user it is necessary to build a model of the user’s preferences within the specified domains. In particular, the user’s tastes and requirements with regard to the product on sale and the user’s interaction preferences with the system. A prerequisite for supplying effective product suggestions to the user is an in depth knowledge of the product. This is described in more detail in section 3.3. As the agent’s interaction with the user will be mainly through natural language, we chose a linguistic representation of a user’s preferences with respect to specific product attributes. An ideal candidate for such a representation is a fuzzy variable with four terms: like, interest, indifference, and dislike. Updates to a user’s preference profile will assign a fuzzy value from the above-mentioned fuzzy variable to specific product attributes. When suggesting a product to the user, the agent will attempt to include as many attributes that fall under the “like” fuzzy set as possible.

In addition to maintaining user preferences with respect to product attributes, the agent must maintain a model of a user’s buying power. Again a fuzzy variable is used to map linguistic terms such as “cheap” and “expensive” to actual product prices. Each term is associated a fuzzy set, the size and shape of which may vary from user to user. For example, some people may consider a product costing £200 to be cheap, whereas others may find it expensive.

User interaction patterns: Another factor we believe needs to be modelled is the user’s interaction pattern. In particular, when was the last time the user visited the shop. Was it recently or did some time elapse. Also, what product did the user buy at his or her last visit, if any. This sort of information will allow the agent to greet the customer with utterances such as: “Hi Frank, I haven’t seen you for a long time. How are you finding the Motorola WAP phone you bought last month?”. A user’s response will allow the agent to gather feedback about its model of the user’s preferences.

User’s affective state: A user’s interaction with the agent will have a significant influence on the user’s affective state. For example, if the product the user is looking for is not available this may cause disappointment and reproach towards the agent [7]. The agent must be aware of such reactions and attempt to remedy them by, for example, suggesting something else the user might like.

In addition, the agent’s own personality and behaviour can influence the user’s perception of it and the service it provides. Work carried out on the FACTS project [5] indicates that if a user likes the character it is interacting with he or she will consider it to be more intelligent and competent. As a result the user will trust the agent more. It is thus important to model the user’s preferences in terms of personalities, and their resulting behaviours, which they like to interact with. The Media Equation [8] indicates that people prefer to interact with personalities similar to their own. So modelling personality traits of the user will give a good indication of the kind of personality the agent must exhibit.

(4)

Acquiring data about a user is not a trivial task. A lot of data can be acquired from a user’s conversation and by observing a user’s behaviour [9]. Indeed, requesting information about a user from the user can have extremely detrimental effects. Firstly, it may appear to be an invasion of the user’s privacy. Users may be wary of how and by whom the information will be used. Secondly, users actually have a poor ability to describe their own preferences. Often, what they say they like and what they really like are not the same thing [10]. So information must be gathered through user interaction (what does the user ask for, what does the user need) or if a user explicitly states he or she likes a specific product attribute.

3.3 Representing the Product

The depth of the description of a product can help provide a more personalised service. However, it is not only necessary to provide a detailed machine understandable description of products, but user requirements must also be taken into account. This is an example of how different contextual domains are interrelated, something that must be taken into account in the design stage of the system.

If we consider the example of mobile telephones we may find that user requirements include the need to know the make, model, technical capabilities and price of the phone. However, users could also want to know the colours that are available, how stylish the product is, if it fits easily in their pocket, what extra features it has and how popular it is. Such considerations are essential for effective service provision and can give the agent a façade of competence.

In our architecture, product information is stored in a product database wrapped by a disembodied database agent. The sales assistant is able to query the database agent for relevant products using ACL messages.

3.4 Inhabiting the Virtual Environment

The virtual environment is one of the most important domains to consider when generating context sensitive believable agents. The agent, the user and the 3D models of products all cohabit the 3D environment representing the virtual shop. Firstly the agent must be able to generate believable behaviours composed of synchronised facial and body animations as well as speech. To achieve believability the behaviours must be appropriate and timely taking into consideration all the issues described so far, i.e. social obligations and permissions, user product and affective preferences, user affective state and the nature of the objects being discussed. Behaviour scripts linking facial animation, body animation and speech can be created on the fly or predefined behaviours stored in a behaviour library can also be used.

The agent must be aware of the position of objects and avatars in the shop and the activity of human agents in order to freely navigate within it avoiding obstacles. Such information can also be used to augment the agent’s conversational capabilities allowing it to face the user when it is engaged in conversation, move towards a specific product or point to a product to disambiguate the subject of conversation [11]. This is facilitated by three basic parametric behaviours, namely walking, facing and pointing.

The agent’s affective state is revealed to the user by means of facial expressions and gestures. Facial expressions include portrayals of joy, surprise, sadness, anger, fear, disgust and more. Gestures are used to complement the facial expressions, reinforcing the presentation of affective states. However, emotions and personality can also transpire solely from the agents body posture, for example, walking proudly.

4 Architecture

The implementation of our agents is based on a BDI architecture and is composed of a set of modular building blocks. At the lowest level they are built using JADE [13], an agent platform that provides, amongst other features, the infrastructure for agent communication. In SoNG we have extended the JADE agent class to define a class hierarchy of agents needed for our application. To implement internal planning and reasoning we use JESS [14] to build a rule based action planner for each agent. JESS operates on a Knowledge Base (KB), which represents each agent’s public data (social model) and private data (beliefs about the state of the world, the user and the discourse, as well as desires and intentions). Each agent is able to make inferences about events in the environment and update its beliefs. Furthermore, each agent makes decisions according to the state of it beliefs. Our JADE agents each have their own jess_processor object. This is constructed during the agent’s setup phase, it starts a new JESS FuzzyRete engine [15] and loads all the rules appropriate for that agent. Each agent has rules stored as text files for inferences, planning, beliefs and desires.

The agent’s type variable prefixes the filename in each case, for example “phoneinferences”, so that the appropriate files can be loaded. In addition there are common rules for each agent, these are the social model, mental model and semantics of communication and events. The Database agent is a special case, and its jess_processor associates JESS facts with database tables in a MySQL database. Figure 2 shows the SoNG agent architecture.

5 Conclusions

Building believable sales agents in 3D worlds requires careful design considerations to be made. We have identified four domains, which must simultaneously form part of the agent’s context. Elements of these domains are heavily interdependent and the relationships must be considered at the design stage of the system. From this perspective, context sensitivity comes at the expense of a generic architecture for believable agents. We propose an architecture for a believable sales assistant to be used on the SoNG project.

(5)

6 References

[1] Bates J., The Role of Emotion in Believable Agents; Communications of the ACM, Special Issue on Agents, (1994)

[2] Bates J., Bryan Loyall A. & Scott Reilly W., Integrating Reactivity, Goals and Emotion in a Broad Agent; 14th Annual Conference of the Cognitive Science Society, (1992)

[3] Gazdar, G., Speech Act Assignment; Appearing in “Elements of Discourse Understanding”. Cambridge University Press, (1981).

[4] Singh, M. Agent Communication Languages: Rethinking the Principles; IEEE Computer. vol.31, no.12; p.40-7. (1998).

[5] Charlton, P., Kamyab, K. & Fehin, P. Evaluating Explicit Models of Affective Interactions; Workshop on Communicative Agents in Intelligent Virtual Environments, Autonomous Agents (2000)

[6] Kobsa, A. User Modelling in Dialog Systems: Potentials and Hazards; AI & Society 4; p214-240, Springer-Verlag, (1990)

[7] Ortony A., Clore G. & Collins A, The Cognitive Strucutre of Emotions; Cambridge University Press, (1988)

[8] Reeves B. & Nass C., The Media Equation; Cambridge University Press, (1996)

[9] Orwant J., For want of a bit the user was lost: Cheap user modelling; IBM Systems Journal, vol. 35; p398-416, (1996) [10] Bellifemine F. et al., FACTS Project Deliverable A15D1,

AVEB Phase 2 Demonstrator, (2000)

[11] Lester, J., Voerman, J., Towns, S., Callaway, C. Cosmo: A Life-like Animated Pedagogical Agent with Deictic Believability. IJCAI '97 Workshop on Animated Interface Agents: Making Them Intelligent, pp. 61-69, (1997) [12] JADE, Java Agent DEvelopment Framework,

http://sharon.cselt.it/projects/jade/ [13] JESS, the Java Expert System Shell, http://herzberg.ca.sandia.gov/jess/

[14] NRC FuzzyJ Toolkit, National Research Council of Canada,