Intelligent Context-Sensitive Help for Dynamic User and Environment Contexts

(1)

ABSTRACT

RAMACHANDRAN, ASHWIN. Intelligent Context-Sensitive Help for Dynamic User and Environment Contexts. (Under the direction of R. Michael Young).

The problem of providing help for complex application interfaces has been a source of interest for a

number of researcher efforts. As the computational power of computers increases, typical applications not

only increase in functionality but also in the degree of interaction with the computational environment in

which they reside. There are powerful software tools available today used for both specialized and

non-specialized tasks that are often used by novice users who attempt tasks without significant training or

knowledge of the application’s interface. These kinds of applications are diverse and complicated in the

variety of functionality they provide, often interacting with other applications on the user’s system. With

current platforms (Windows, Macintosh, Linux etc) providing extensive multi-tasking facilities, interaction

with these applications is sometimes affected by the context of the environment itself (e.g., application

windows being minimized, maximized or obscured by those of other applications). The interdependencies

between applications and their environments increase the difficulty of providing effective context-sensitive

help when building an application’s help documentation. The purpose of this research is to create an

Intelligent Help System, which incorporates these interactions and affecting factors when providing help.

The SmartAidè system, which was developed as part of this effort, works on the premise that the user has a

goal when interacting with the application. This document will provide a detailed overview of the

architecture of the system along with the underlying design decisions. The system was then evaluated

against traditional application help documentation to test its effectiveness. The results and analysis of this

(2)

INTELLIGENT CONTEXT-SENSITIVE HELP FOR

DYNAMIC USER AND ENVIRONMENT CONTEXTS

by

ASHWIN RAMACHANDRAN

A thesis submitted to the Graduate Faculty of North Carolina State University

in partial fulfillment of the requirements for the Degree of

Master of Science

COMPUTER SCIENCE

Raleigh

2004

APPROVED BY

(3)

BIOGRAPHY

Ashwin Ramachandran was born in Mumbai, India on May 20th, 1980. He completed his studies from L.D.

College of Engineering, Ahmedabad, India in 2001. After finishing his undergraduate studies, he was then

interested in gaining industry experience and worked for Larsen & Toubro until June 2002. Ashwin joined

the MS program in Computer Science in Fall, 2002. His focus during graduate studies has been towards

application of AI techniques for enhancing usability of complex application interfaces. After graduating, he

(4)

ACKNOWLEDGEMENTS

I would first like to thank my parents Mr. Balakrishnan Ramachandran and Mrs. Valli Ramachandran, for

their immense support, advice and unconditional love. Without them holding my hand it would have been

impossible to sustain and emerge from the various pressures of graduate school. I am also deeply thankful

to my advisor, Dr. Michael Young, who gave me the opportunity to pursue a topic of my interest and

patiently guided me through my first steps in research. His kind words and encouragement especially

during the periods I could not get any results kept me motivated to explore new ways of approaching the

problem at hand. A sincere thanks also goes to Craig Allen who worked tirelessly and often under heavy

workload in the last couple of months on the implementation of SmartAidè. I would also like to thank my

committee members, Dr. James Lester and Dr. Robert St. Amant for taking time out to meet with me and

give me helpful comments for my work.

There are several people who made being here away from my family, easier to survive. I would like to

thank Arnav, Ashish, Imran, Kuldip, Madhup, Piyush, Sameer and Sandeep who have always been the best

of friends. They have been there during my best moments and have supported me through the most testing

of times.

Finally, I would like to thank all the people at Blackbaud for having understood and waited patiently as I

(5)

LIST OF TABLES

5.1 Table indicating task-times and number of times help was accessed ………. 43

5.2 Average times for the tasks using non-context sensitive help and SmartAidè……… 43

5.3 Average number of times help was sought for each task ………... 44

5.4 Average times to achieve the sub-tasks that had more than 20% help requests ………. 46

(8)

LIST OF FIGURES

2.1 Intelligent User Interfaces’ two major fields of research……… 6

3.1 High-level view of SmartAidè’s help generation process ………. 12

3.2 Macintosh Environment ………. 14

3.3 The iTunes application interface ……… 15

3.4 The Finder application interface ………..……….. 16

3.5 The Mimesis System ……….. 17

3.6 SmartAidè architecture ………... 19

3.7 The Mimesis narrative plan ……… 20

3.8 DPOCL Plan operator ………. 21

3.9 Action class example ……….. 23

3.10 Execution DAG examples ……….. 24

4.1 SmartAidè startup ………..………. 28

4.2 The complete plan for the example scenario ……….. 29

4.3 Step 1 ………...……….. 30

4.4 Step 2 ………...……….. 31

4.5 Step 3 ………...……….. 32

4.6 Step 4 ………...……….. 33

4.7 Step 5 ………...……….. 34

4.8 Step 6 ………...……….. 35

4.9 Step 7 ………...……….. 36

4.10 Step 8 ………...……….. 37

5.1 Pre-evaluation questionnaire ………. 39

5.2 The tasks defined for the subjects to perform using iTunes and Finder ……… 40

5.3 Pre-evaluation questionnaire ……….

41-42 5.4 Pie-chart breakup of the method of help most used ………..………. 42

5.5 Task difficulty rated by the subjects in the post-evaluation questionnaire ……… 44

5.6 Breakup of percentage of help-requests for each individual sub-tasks that make up the tasks ……… 45

(9)

1. INTRODUCTION

1.1 The need for intelligent help

If we look at the use of computers in the last fifty years we can distinguish two trends. The first trend is the

increasing use of computers for a growing range of purposes. Ever since the introduction of the first

(electronic) computers in the 1940s, the number of computer users has been growing. Before 1970

computers were regarded mainly as scientific tools and only specialized programmers were able to perform

calculations on a computer. With the introduction of the PC in the 1980s this changed drastically. Many

people could now afford to buy a computer. In addition, PCs used the (relatively) easy to use command-line

and graphical interfaces, so it became much easier to learn how to use a computer. With this a new range of

computer applications was developed: word-processors, spread sheets, desktop publishers and computer

games. The rise of the Internet and ICT industry during the 1990s further stimulated computer usage [39].

People with similar interests from all over the world joined in virtual communities and, with the appearance

of laptop computers, people were not limited to computer use at home or work but could work anywhere

they wanted. Nowadays millions of people are using computers in many different locations and situations.

People with a Personal Digital Assistant (PDA) or Internet-capable mobile phone can be connected to

anyone, anywhere at anytime.

The second trend that runs parallel to this is the increasing complexity of computer programs and their

interfaces. With the doubling of computing power every eighteen months predicted by Moore's law,

program developers can afford to put more and more functionality into a computer program. If we look at

the latest version of Microsoft Word for example, no normal computer user makes use of all its functions,

not only because there are so many options and settings, but also because people do not know how to use

them or even do not know they exist. The two developments described above justify the need for good

human-computer interfaces. Many computer users are experiencing problems and most of these problems

are related to the interface. The encountered problems vary from confusing menu choices and

incomprehensible error messages to unnatural (rigid) interaction. Not only beginners, the elderly, or people

with disabilities are having trouble, but experienced computer users often bump into problems as well. We

need computer interfaces that can understand and help people and explain to them how to use the available

functions. We need to make sure that computers and other computerized devices remain accessible for

everyone.

Powerful software tools are available for many specialist expert tasks such as statistical analyses. Much of

this software will be used by non-experts. They will try to perform tasks with the system almost

immediately, even if they do not know how to do the task: this is known as the production paradox [7]. All

(10)

information needed at a certain moment and, after that, they have to use this information for their task

execution. These activities are troublesome; moreover, non-experts will hardly use “screen-manuals”

according to the production paradox. It might be more effective to integrate the help into the task execution

of the user. The system should take the initiative to present knowledge the user is lacking. The knowledge

presentation should come at the right moment and only refer to information that is relevant in the current

context.

Intelligent user interfaces are being proposed as a means to make systems individualized or personalized,

thereby increasing the system's flexibility and appeal [16]. However, this is extremely difficult in systems

that serve the needs of large and diverse user populations. Intelligent help has been found to be one of the

significant ways of providing better support to the users.

A "help" system aids a user in performing a specific task [14]. Help is very similar to tutoring, but the main

objective for a help system is to get something done, and not to make the user learn something. Another

difference is that many tutoring systems will lay out specific tasks for the user to do, in order to diagnose

his or her misconceptions. A help system must act upon whatever information it can gather from the user's

own choice of interactions with the system. Intelligent help systems may help users when they are

accidentally involved in erroneous situations or have no clue how to go about performing a task. In

addition, they may inform the users about useful functionality of the system that these users are not aware

of.

The role of context sensitivity is central to the efficacy of the help system. It implies adaptability to the

current user individual characteristics, that is his/her general knowledge and skills, motivations and

objectives [4]. Adaptable systems attempt to implement access methods or information selection support

tailored to the actual needs of users, to their motivations and/or skills, using 'static' contextual knowledge

Context sensitivity also includes sensitivity to the actual current goals and needs of the user. This

‘dynamic’ knowledge proves useful for solving many problems encountered by novice users; for instance,

error recognition and correction may be facilitated by comparing the actual state of the system with the

state that should be reached to fulfill the user's current goal [40]. This example brings out the usefulness of

continuous awareness to the current state of the system for ensuring ‘dynamic’ context sensitivity. In

particular, such contextual knowledge is necessary for generating help messages consistent with the system

(11)

1.2 Intelligent Help System (IHS) Background

The problem of providing help for complex interfaces has traditionally been addressed by the research on

adaptive help systems (more often called Intelligent Help Systems, IHSs). The goal of IHSs is to provide

personalized help to a user working with a complex interface, by diagnosing errors and suboptimal user

behavior, identifying missing pieces of knowledge about the interface, and providing on-demand help to

extend the user’s knowledge of the interface [4, 40]. The area of IHSs has been well investigated. The first

wave of research on intelligent help was initiated when UNIX systems, with their complicated interface,

were distributed widely in universities and came to the workplaces of many computer-naive users. Due to

this fact, almost all-early research on intelligent help systems was focused on UNIX and its utilities [4, 18,

24]. The appearance of ‘friendly’ WIMP interfaces created a pause in IHS research, but in just a few years

even these interfaces had reached the level of complexity where intelligent help is really important. The

current second wave of research on IHSs investigates useful ways of providing intelligent help in modern

application systems [9, 13]. Probably, a third wave will be created by advanced WWW applications.

Traditionally, IHSs are divided into two classes: active and passive help systems. In a passive help system,

it is the user who initiates the next help session by asking for help. An active help system initiates the help

session itself. ‘Passive-active’ and ‘Static-adaptive’ are two different dimensions of classification. A

well-known example of active but non-adaptive help is ‘did you know’ (DYK) help, which offers users random

pieces of knowledge (called hints) during their work. A number of modern applications (like Microsoft

Word) usually suggest DYK help at the beginning of a session. Adaptive passive-help systems support

users by suggesting the next piece of knowledge to be learned by the user when help is requested. The main

problem for these systems is how to decide what to say. The suggested piece of knowledge has to be

relevant to the user’s current goal. To determine what is relevant, IHSs track the user's goals and the user’s

knowledge about the interface. This information about the user is stored in the user model [8,28]. The user

model is often initialized through a short interview with a user and then kept updated through automatic

user modeling. IHS researchers have investigated a number of effective techniques of automatic user

modeling. Most of them are variants of two basic technologies that were tried in the very first IHS projects

[24, 45]:

1) Tracking the user’s actions to understand which commands and concepts are known to the user and

which are not.

2) Using task models to deduce the goal of the user.

The first technology is reasonably simple. The system just records all used commands and parameters,

assuming that ‘used’ means ‘known’. The second technology is much more complicated; it is based on plan

(12)

model. To identify missing pieces of knowledge, the system

1) Infers the user's goal from an observed sequence of commands,

2) Tries to find a more efficient sequence of commands to achieve this goal,

3) Identifies knowledge elements required to build this more efficient sequence

Adaptive active help systems make all decisions that adaptive passive help systems make plus one more

decision: when to interrupt the user’s work and suggest help. The problem of when to interrupt is older than

IHSs themselves. This problem is known in the area of Intelligent Tutoring Systems (ITSs) as the problem

of coaching [6, 12], and here IHSs apply the ideas developed by earlier ITSs [3]. The secret of good

coaching is not to interrupt the user's work in each situation when the user makes an error or demonstrates

suboptimal behavior and the system has something to say about it. A good coach will interrupt the user

only if the work situation is relevant for correcting the user or suggesting a new piece of knowledge.

Usually, a set of heuristic rules (mostly domain-dependent) is used to determine whether the situation is

relevant.

1.3 Thesis Goal

Applications today on all platforms (Windows, Macintosh, Linux etc) are diverse not only in the variety of

functionality they provide but also in their interactions with other applications in the system. This

cross-application interaction not only increases complexity but also increases the amount of knowledge required

by the user to interact with these applications. These applications are also dictated by the objects and

occurrences in the environment in which they reside, such as being minimized, maximized, having variable

window sizes and possible obscuring by other windows. Providing help for the range of situations the users

might find themselves is hard to do with the traditional help documentation that is provided with the

applications.

There are three main goals of this research:

1) Design a knowledge model and representation to encompass the application’s various states, including

modeling its environment and its interactions with another application.

2) Devise a method of providing help through the use of planning techniques that not only relates to the

current task-context of the user but also the current state of the application environment.

3) Evaluate the resulting implementation to analyze the performance of the help system against traditional

help documentation provided.

We have developed a tool called SmartAidè that focuses on the concept of “Show and Tell” for providing

(13)

actual execution of the actions to get from the current context to the context of the task the user needs to

perform. It records the effects of user actions on the interface but does not make any assumptions about the

goal he is trying to achieve. The tool works on the premise that the user has a goal in mind when requesting

help and provides information about their goals through a dialogue box. This eliminates the need for the

system to have a complex task model to determine user goals by tracking their interactions. Through the

use of planning, the system uses the representation of the actions and states to achieve an action sequence

that will take the application from its current state to achieve the goal of the user. A more detailed

description about the architecture is given in Section 3.3.

1.4 Outline

Section 2.1 gives a overview of the areas of research associated with this work. Section 2.2 will then

discuss some of the related work in the field of IHS. Section 3 then goes into the details of role of

SmartAidè, its architecture and its design decisions. Section 4 walks the reader through an example

scenario where SmartAidè provides help. Section 5 discusses an empirical evaluation method to measure

the effectiveness of SmartAidè against application help text, followed by conclusions and directions

(14)

2. LITERATURE RESEARCH AND RELATED WORK

2.1. Literature Research

This thesis draws its inspiration from two areas of research – Intelligent Help Systems and Planning.

Intelligent Help Systems forms one of the application areas of Intelligent User Interfaces (IUIs). Section 3.1

explains what IUIs are, including examples of the different intelligent techniques used in IUIs and

examples of the possible application areas for IUIs.

In order to be able to provide support to the user for the applications, SmartAidè records the effect of the

user actions on the interface. Using the goal provided by the user, it builds an action sequence from the

current application state to the user goal state using an artificial intelligence technique called Planning.

Section 3.2 provides an overview about planning.

2.1.1. Intelligent User Interfaces

Users today have access to more information than ever before. Interfaces through which the users interact

with all this information tend to become very complex. It can also happen that there is a limited amount of

time in which a task needs to be achieved, and it is nearly impossible to achieve tasks quickly with certain

interfaces. Or you can have a certain application, which is used by completely different users with different

needs. But all these users however have to use exactly the same interface. Another possibility is that the

domain in which the application is used changes. All the above situations are examples of when regular

interfaces can become too complex or inflexible. To help users in these kinds of situations intelligent user

interfaces were born.

INTELLIGENT INTERFACES

ARTIFICIAL INTELLIGENCE

HUMAN – COMPUTER INTERACTION

Figure 2.1. Intelligent User Interfaces’ two major fields of research [21]

The research area of intelligent user interfaces is at the boundary of a large amount of different research

areas. The two most important of these research areas are Artificial Intelligence (AI) and Human-Computer

(15)

do intelligent actions. In HCI computer interfaces are designed that leverage off a human user to aid the

user in the execution of intelligent actions. An IUI should be able to do intelligent actions and to leverage

off the human user’s intelligence. In other words, IUIs are human-machine interfaces that aim to improve

the efficiency, effectiveness, and naturalness of human-machine interaction by representing, reasoning and

acting on models of the user, domain, task, discourse, or media [25]. In the following paragraphs a few of

the “intelligent” techniques that enable an intelligent user interface to improve human-machine interaction

are describes.

The most important property of IUIs is that they are designed to improve communication between the user

and machine. Below we give a list of several types of techniques that are being used today in intelligent

user interfaces:

• Intelligent input technology uses innovative techniques to get input from a user. These techniques

include natural language (speech recognition and dialogue systems), gesture tracking and recognition,

facial expression recognition, gaze tracking and lip reading;

• User modeling covers techniques that allow a system to maintain or infer knowledge about a user

based on the received input;

• User adaptivity includes all techniques that allow the human-machine interaction to be adapted to

different users and different usage situations;

• Explanation generation covers all techniques that allow a system to explain its results to a user (e.g.

information visualization, or tactile feedback in a virtual reality environment).

Other important properties of IUIs are personalization and flexibility of use. To achieve personalization,

IUIs often include a representation of a user. These user models log data about the user’s behavior,

knowledge, and abilities. New knowledge about the user can be inferred based on the input and interaction

history of the user with the system. In order to be flexible many IUIs use adaptation or learning. Adaptation

can occur based on the stored knowledge in a user model or by make new inferences using current input.

Learning occurs when stored knowledge is changed to reflect new encountered situations or reflect new

data. Because of the difficulties involved in creating IUIs and the amount of knowledge engineering that is

needed, most IUIs focus on a specific method of interaction (e.g. speech) or on a particular narrow

application domain.

The main application area for intelligent interfaces is in those situations where knowledge about how to

(16)

certain actions itself or suggesting certain actions to the user. A couple of applications areas that involve a

lot of situations like the one mentioned above are information filtering, intelligent tutoring and intelligent

help.

• Information filtering aims to find a structure in the available information. This structure can be used to

aid users in finding the information that is useful to them.

• Tutors are programs that teach a user how a certain computer program works or how a certain real-life

task has to be accomplishes. Intelligent tutors can infer the user’s understanding of the program from

his performance on specific tasks. Based on this model of the user’s understanding, the tutor can give

the user advice on how to improve his performance.

• An intelligent help system is very similar to the intelligent tutor. Only a help system does not try to

teach the user, but helps the user get a certain task done. A help system lets users plot their own course

through the program.

2.1.2. Planning

Automated plan generation techniques have been widely investigated and used in the field of artificial

intelligence. Planning is a technique for solving problems that can be represented as having an Initial State,

a Goal State and a set of operators that describe valid actions in the world. The planning algorithm

represents the process of searching for a state of the world that satisfies the goal by applying a number of

operators from the initial state. This process can be represented as a graph-search of all the possible

connections between nodes representing the world-state after each operation. The initial and goal state are

collections of predicates stating the information about the world. A plan operator is represented by:

• A precondition list • An effect list

• A set of constraints on the operator

Formally, a plan P is a tuple <S, C> where S is a set of steps and C is a constraint tuple of sets of

constraints on S. Minimally, C contains a set of ordering constraints O that define a partial temporal

ordering on the execution order of steps in S. Given a planning problem, the ultimate aim of the planning

algorithm is to come up with a sequence of ground operators such that by application of these from the

(17)

A planning problem P (A, D, I, G) is a 4-tuple, where A is the set of operators, D is the finite set of objects,

I is the initial state of the system and G is the goal state of the system. The solution to the planning problem

is a plan: a tuple <S, O, L, B> where S are the steps, O are ordering constrains on the elements of S, L are

causal links representing the causal structure of the plan, and B are binding constraints on the variables in

S. Causal links are triples <Si, e, Sj>, where Si and Sj are parts of S. c is a result of Si and also a

pre-condition for Sj. Typically, the ordering constrains only induce a partial ordering, so the set of solutions are

all linearization of S consistent with O.

Given both the set of objects from the planning instance and the operators from the planning domain, the

complete set of actions that are possible in this planning problem can be derived by performing all possible

substitutions on all operators. A complete plan is defined as a set of ordered actions (a1; a2; a3;…;an). When

these actions are applied in order an(an-1(…a3(a2(a1(I)))…)), the following conditions hold. Before the

application of each action ai, ai's preconditions are true in the current state, and after the application of the

final action, the goal state G is true.

The Mimesis narrative planner uses DPOCL [44] as the planning algorithm for generating the help

sequence. A DPOCL plan contains elements composed from five central types. First, they contain steps

representing the plan's actions. Ordering constraints define a partial temporal order over the steps in a

DPOCL plan, indicating the order the steps must be executed in. Hierarchical structure in a DPOCL plan is

represented by decomposition links: a decomposition link connects an abstract step to each of the steps in

its immediate sub-plan. Finally, DPOCL plans contain causal links between pairs of steps. A causal link

connects one step to another just when the first step has an effect that is used in the plan to establish a

precondition of the second step.

The hierarchical structure supports action decompositions for abstract action specifications. A hierarchical

approach has a number of potential benefits. First, it may lead to improved performance due to the

reduction in amount of search needed, second, it is easier to encode domain knowledge as a set of abstract

and primitive actions, which allows for re-use of primitive actions and finally it supports interleaved

planning and execution.

2.2. Related Work

Research that explores generation of context-sensitive help can be roughly categorized into three groups

based on the method used for help generation: Interface Design Models, Task Models and Planning.

The method of generating help through the association of help text with the modules of the application

(18)

method involves utilizing explicit knowledge representations about application actions and the data models

developed during the interface design time. H3’s [26]method of help-generation involves determining the

user-context and using standard templates with pre- and post-conditions of the selected interface object to

generate help texts. The emphasis of this method is to substantially reduce the effort of developers in

writing help for applications by generating help text using the interface design model, which could be later

edited. This is useful in providing help, related to user-context with links to related topics as determined

through dependencies in the modules of the model. However this method does not provide task-oriented

help i.e. show a user what tasks to perform to achieve his/her desired goal. CARTOONIST [35]provides

context-sensitive animated task-oriented help by building a knowledge base that contains information about

the system-specific actions and the application-domain specific actions, during design time. The system

then draws a mapping between the two based on pre and post-conditions. Generating help involves

mapping the user task to a specific action within the application domain and using the mapping to create

the tasks required for performing the action. Most application systems however today have several

alternative approaches to performing a task. The knowledge model of [26]did not take alternative scenarios

into consideration if one set of tasks did not have their pre-conditions valid for execution. The problem in

general of exploiting knowledge at design time to provide support, is that up until now only very few

real-world applications have been developed on the basis of such user-interface development systems.

Integrating these approaches into present day applications is not possible.

Help generation using Task Models involved representing the system as a set of hierarchical system tasks

as perceived by the user. This knowledge model enabled generation of task-oriented help. Interactive

Systems Workbench [31] is a system where designers could build this hierarchy along with a set of

temporal constraints to define the order in which tasks occurred in a system. The hierarchy consists of

abstract tasks (tasks which consists of a set of sub-tasks) at the top of the hierarchy and broke them down

until the granularity was a set of basic tasks at the bottom of the hierarchy. The pre and post-conditions of

the task were represented in this hierarchy. Whenever help for performing a task was requested by the user,

the hierarchy is traversed to find a way from the root to the task to be performed, and the help text is

generated by concatenating pre-written text with the parameters of all the intermediate tasks to be

performed. HyPlan [15] is a system where the knowledge base consists of pre-compiled action plans

describing various complex tasks, and associations with their respective help topics. User context is gauged

by recording the current user activities and then matching them to the action plans in the knowledge base.

The help topics related to the matching action plans were then displayed to the user. Using task models

involves modeling the interface design space, which for even simple interfaces is large and dependent on

many variables. Also present day applications have interfaces to other applications and their environment.

Hence task-modeling might become complicated in these scenarios. Determining user-context and their

(19)

Planning techniques has been explored by researchers for advisory systems and help systems. Fischer [14]

developed an active help system, ACTIVIST, which recognizes what the user is doing and evaluates how

the goal is achieved. These tasks form the heart of the active help system, and are accomplished by “plan

specialists”. A plan specialist in ACTIVIST is an automaton, which recognizes a particular plan, and an

expert that knows the optimal plan for that goal. Finin [13] also provided active help in his WIZARD

system. WIZARD employed a “bad plan” catalog, similar to Burton’s bug library [6], which reduced the

problem to matching user behavior to items in the catalog. User modeling research, especially related to

adaptive user interfaces, came to be actively integrated with planning techniques for intelligent help

generation. [10] describes a system where action graphs are constructed with the nodes in the action graph

storing both the action information and the expertise-level of the user to perform that action. The nodes also

contain help-text associated with the action. Help is provided by determining the user expertise by the

actions he performs, determining the task-context he is performing and matching it to the action graph

commensurate with his expertise. Planning and user modeling techniques have been employed in research

into Intelligent Tutorial Systems (ITSs). COMET [12] is a system where plans are generated from the

knowledge base to derive explanations for how to perform equipment maintenance and repair. The planner

uses the knowledge source to determine what content to display depending upon the task being performed

by the user. STEVE [33]is an animated agent that inhabits a virtual world with a user and provides training

to the student by deriving action sequences from partial-order plans, to perform domain tasks. These plans

are flexible enough to adjust to the state of the interface and allow the agent to adapt if an unexpected

change occurs. MIMESIS [42] supports the dynamic generation of interactive narrative sequences, through

the use of partial order plans to achieve a specific set of goals within the virtual environment and

maintaining the coherence of these plans as they execute in the face of unanticipated user activity.

Among the current commercially available applications with integrated context-sensitive help, the Office

Assistant and the Macintosh Balloon Help have been widely documented. The Office Assistant is a

derivative of the Lumière project [17]. The Lumière project centered on harnessing probability and utility

to provide assistance to computer software users. Bayesian user models were employed to infer the user’s

need by considering a their background, actions and queries. But being an active system that intervened

during the user’s task, it was noticed that it hindered the users as they gained more experience with the

application and required less assistance. The Macintosh Balloon Help system allows the user to get help

about various objects in the screen and their purpose. The user can point the mouse at a particular object on

the screen and a balloon pops up providing information about that object. However this kind of help is

(20)

3. ABOUT SmartAidè

3.1. The Role of SmartAidè

Current GUI environments (Windows, Macintosh etc.) provide the user with a rich arena in which he has

access to a wide variety of application software. Such a system provides the ideal setting for building a help

environment. Due to its uniform design and constraints provided, these environments allow a situation

where the user can predict effects of actions performed on elements of the interface.

The SmartAidè system takes advantage of the Macintosh environment to provide help to two interacting

applications – the iTunes music player application, and the Finder application. The iTunes application

described in section 4.2.2, is the digital music player application that allows the user to play music encoded

in various digital formats. These music files are stored in the file system of the Macintosh accessible by the

Finder application, described in section 4.2.3. To look for music files or import or export song lists, the

iTunes interacts with the Finder to allow users to decide where they would like to store this information.

SmartAidè embodies a methodology where the application environment context influences the help

generation as much as the intended user goal. The system works on the principle that the sequence of user

interactions translates into some task they are trying to achieve. Hence the system uses planning for

building and driving its action sequences. Along with the user goal, the system builds a representation of

the current context of the application by recording the state of all objects affected in the application world

based on the interactions of the user. The current context and the user goal form the planning problem

specification. Our efforts were focused along using the Mimesis Narrative planner to generate the action

sequences. However, due to constraints of time, for the empirical evaluation, we employed a version of

SmartAidè that used a plan database, which consisted of pre-compiled action plans derived from planning

problem specifications in different contexts using Longbow. These plans were used to drive the animation

[20] and display of step-by-step instructions within the application environment. Currently, work continues

on connecting SmartAidè to Mimesis as mentioned in section 6.2.

(21)

APPLICATION

Actions

SMARTAIDÈ

Help Request Action

Execution

Help Text

MIMESIS CLIENT

Interactions

USER

Figure 3.1. High-level view of SmartAidè help generation process

Whenever the user requests help, the help dialog is displayed allowing the user to specify his/her intended

goal. This help request is associated with a goal state of the user. Both the current context and the goal

states are stored in a form recognizable to the planner. These states are then sent to the planner, which has a

representation of all actions with their associated pre- and post-conditions. Using the current context, the

planner then performs a search over the domain of actions and builds an action sequence that will enable

the state of the application to proceed from the current scenario to the goal scenario. The generation of help

is a combination of describing the help associated with the action and actually performing the action by

moving the mouse and interacting with the various objects in the application. The help text is displayed in

the same help dialog through which the users specified their help request. This method is analogous to

social interactions between humans where one person provides information to the other person without

drawing inferences or the underlying intentions of the person asking for the information for e.g. asking for

directions. SmartAidè makes no assumptions about the reason or the intention of the user during his

sequence of interactions or while providing help. The system also does not provide any explanations about

why actions were being executed the way they were, while providing help. The system does not attempt to

tutor the user but only gets the task done allowing the user to infer the underlying causal relationships

between the actions.

The SmartAidè tool was built in such a way that it places no restrictions on the way a particular user goal

needs to be achieved. This way the users can either discover the different ways of getting their task

achieved through their various experiences with using the help system. This helps build user preferences

and helps determine the knowledge the user has of the application. This information could be encoded into

a user model, something that has not been incorporated into the current architecture. Modeling the user

would enable the tool to provide help tuned to both the preferred interaction sequence of the user and the

(22)

3.2. Systems and Applications being used

This section provides a brief overview of the Macintosh system, the iTunes application, the Finder

applications and the Mimesis system.

3.2.1. The Macintosh System

Figure 3.2. The Macintosh environment

The Macintosh computer uses a graphics-based operating system with a user-friendly interface with easy

access to files. All files and directories, or folders, appear as icons on the Desktop. The Desktop is the

screen that appears after the Macintosh system loads up. Some permanent fixtures on the Desktop are the

Macintosh HD and Trash. Floppy disks, zip disks, CDs, etc. that you have inserted also appear on the

Desktop. Icons are small pictures representing a file, folder or application, and are usually associated with

the item's name. There are also drop-down menus located on the top of the screen that enables application

or system associated commands. Examples of a menu would be 'File' or 'Edit'. The mouse is an integral part

of the user interface, although many operations can also be performed with the keyboard.

Navigation on a Macintosh computer can be achieved using the keyboard or the mouse. On the screen, the

user can see a pointer or cursor, usually in the shape of an arrow. Moving the mouse (up, down, left, right)

will cause the pointer on the screen to move in the same direction on the screen. The Macintosh mouse is

(23)

Application and files can be used through windows, which appear on the desktop. Multiple applications are

opened in multiple windows but the user can only use one window at a time, called the 'active' window.

While you can have many application programs open at once, only one of them can be active at a time.

Window sizes are variable and can be changed by the user. A window can cover the whole screen when it

is “maximized” or hidden when it is minimized. The Macintosh has a taskbar that provides links to

applications and also provides links to the currently open windows.

3.2.2. iTunes

Figure 3.3. The iTunes application interface

iTunes is the digital music player application offered with all Macintosh systems. It offers a flexible,

personalized interface, with which one can listen to a specific artist, genre or eras of music. ITunes also

provides facilities for creating, adjusting, storing, and naming the digital music files. The Music Library

helps you organize the music files into categories called ‘playlists’. Sorting and finding artists, albums,

genres, etc are all important functions of the Music Library. The application also allows the user to burn

CDs and transfer playlists to external devices such as MP3 players.

Figure 3.3 above shows the iTunes application window. This window is divided into three windows that

could be made visible or invisible depending on the user’s preference.

• The Sources/Playlist: This pane is primarily made up of the users’ playlists, but also lists Radio,

and the iTunes Music Store.

• The Browser: For any list of songs the user happens to be viewing, the Browser breaks the view

(24)

• The Songlist: This is the list of songs contained in the source (folder/hard-drive, network

folder/drive) the user has selected, and the items selected in the browser.

Playlists are lists of songs (or other audio files) that the user can create to organize his/her library or burn a

CD. Playlists can include songs or spoken word files from the library, or radio stations. Adding a song to a

playlist does not remove it from the library; it places a pointer (or reference) to the file in the playlist. Any

radio stations in the playlist will tune in when the user is connected to the Internet and the user selects the

playlist. Playlists are of two types: a standard playlist or a smart playlist. Standard playlists are created

manually and are not automatically updated as the library changes. Smart Playlists are created based on

certain criteria chosen by the user, and can be automatically updated as the library changes.

3.2.3. Finder

Figure 3.4. The Finder application interface

The Finder application helps in navigating and organizing files and folders on the system. Figure 3.4 shows

a typical view of the Finder. The organization is visible on the sidebar, in the left-hand pane of every open

window. This sidebar lists hard drives, network volumes, optical media and other accessible volumes at the

top. Clicking on any of the items listed will display the files and folders available in the particular media.

(25)

DVDs, FireWire and USB hard drives and flash memory cards. The sidebar can be customized with the

user’s favorite folders by dragging them to the lower section of the sidebar. These favorite folders provide

convenient shortcuts to get to current projects and a quick way to copy files. The Finder is integrated with

almost all applications when the “Open File” or “Save File” commands are activated within the application.

The Finder also allows the user to open a file/folder on a network server.

Searching for files and folders is achieved as the users watch the results as they type and refine search

terms. And they can specify search locations through a popup menu, which lets them choose

“Everywhere,” “Local Disks,” “Home”, or “Selection.”

The Action menu gives the user access to contextual Finder commands based on the current selection such

as label files, or move items to the trash. The Action menu also gives the user a handy way to get more

information about a file. If the users have a two-button mouse, they can also right-click on an item to bring

up the menu, or hold down the control key while clicking on an item with a one-button mouse.

3.2.4 The MIMESIS System

Figure 3.5. The Mimesis system

The Mimesis system integrates a suite of intelligent control tools with a number of virtual world

environments ranging from Java-based games running on PDAs and cell phones to commercial 3D game

engines like Unreal Tournament 2003 (UT). While these virtual worlds are well-suited as engines for

building conventional interactive environments, the representation of the worlds that they model do not

typically match well with those used by AI researchers. Most virtual worlds' internal representations are

procedural - they do not utilize any formal or declarative model of the characters, setting or the actions of

the stories that take place within them. Consequently, direct integration of intelligent software components

is difficult. To facilitate this integration, Mimesis defines a multi-agent architecture in which default

(26)

modified) so that low-level control of the virtual environment is performed by the environment's engine and

high-level reasoning about narrative structure and user interaction is performed collectively by a suite of

intelligent agents (called Mimesis Components).

Mimesis acts as a story generator, determining the narrative elements of the user’s experience within the

virtual world. Collectively the agents operating with Mimesis are responsible for both the generation of a

story (in the form of a plan characterizing all physical and communicative actions that are to be performed

within the environment) and the maintenance of a coherent narrative experience in the face of unanticipated

user activity. The process of constructing a narrative experience involves a number of specialized

functions, including reasoning about the actions of individual characters, generating any dialog used by the

characters or narration to be provided by the system, creating cinematic camera control directives to convey

the action that will unfold in the story, etc. To facilitate the integration of corresponding special-purpose

reasoning components, the Mimesis architecture is highly modularized. Individual components within the

MC run as distinct processes distributed across a network; the processes communicate with one another via

a well-defined TCP-based message-passing protocol. Developers extending Mimesis to provide new

functionality wrap their code within a message-passing shell that requires only a minimal amount of

customization.

While the MC architecture contains a number of intelligent components, one of these is central to the

creation of SmartAidè’s action sequence. The narrative planner is responsible for determining all actions

that need to occur within the application environment to achieve the intended goal of the user. Additional

components, such as a text generation system (used to create custom dialog and narration), a user model

and an HTTP server are also included in the Mimesis system. A more detailed description of the narrative

planner is provided in Section 3.3.1.

3.3. SmartAidè Architecture

Mimesis uses a client/server architecture where high-level reasoning about the plan structure and the user

interaction is performed by the intelligent control elements on the server side while the client side performs

the low-level control of the application/game using the plan structures generated. Communication is

achieved through the use of XML encoded messages between the communication modules of the client and

Mimesis server. This enables developers and researchers use Mimesis as a testbed for applications on

different platforms using its various intelligent components. Figure 3.6 shows the various components on

the client side and the components being used in Mimesis. Section 3.2.4 provides a brief introduction to

(27)

Application Help Dialog Execution Manager Context Recorder Goal Diagnoser Action Library Plan Operators User enters intended task User text transferred to goal diagnoser Action Execution Action Sequence Current State Comm. Module Goal State MIMESIS User Goal Planner recognizable goal state CLIENT Current user/application context variables XML Encoded Plan Request XML Encoded Action Sequence Plan Request Action Sequence Comm. Module Narrative Planner

Figure 3.6. SmartAidè architecture

3.3.1. Mimesis components

Mimesis Narrative Planner

The MimesisNarrative planner is responsible for handling plan requests initiated by the game engine. A

plan request contains three elements. First, it contains an encoding of all relevant aspects of the current

application state. Second, it names one of a set of pre-defined libraries of actions that can be used by the

story planner to compose action sequences. Finally, it contains a set of goals for the plan, which is a listing

of conditions in the application environment that must be true at the time that the plan ends its execution.

The story planner responds to the plan request by creating a story world plan, a data structure that specifies

the actions of the characters in the game and the system-controlled objects that will execute over time to

(28)

4. ShowViewOptions (editMenu, optionsButton) 5. Goal

State

At (mouse, editButton)

Visible (editMenu) Visible (optionsButton)

At (mouse, optionsButton)

3. MouseMove (editMenu, optionsButton) 2. OpenEditMenu (titleBar, editButton)

Visible (editMenu) Enabled (editMenu) MouseAt (titleBar) 0. Current

Application State

1. MouseMove (titleBar, editButton)

Visible (optionsWindow)

Figure 3.7. A Mimesis narrative plan. Gray rectangles represent actions and are labeled with an integer reference number, the actions’ names and a specification of the actions’ arguments. Arrows indicate causal links connecting two steps when an effect of one step establishes a condition in the application environment needed by one of the preconditions of a subsequent step. Each causal link is labeled with the relevant world state condition. Temporal ordering is roughly indicated by left-to-right spatial ordering. The white box in the upper left indicates the game’s current state description, and the box in the upper right indicates the current planning problem’s goal state.

This plan involves displaying the view options dialog for changing the number of visible columns. Step 1) Move the mouse from its current position at the titlebar to the edit menu button. Step 2) Click the edit-menu button to display the edit-menu. Step 3) Move the mouse from the edit-menu button to the View Options button in the menu. Step 4) Click the View Options to open the View Options window.

Mimesis’s DPOCL plan algorithm uses refinement search [19] as a model for its plan reasoning process.

Refinement search is a general characterization of the planning process as search through a space of plans.

A refinement-planning algorithm represents the space of plans that it searches using a directed graph; each

node in the graph is a (possibly partial) plan. An arc from one node to the next indicates that the second

node is a refinement of the first (that is, the plan associated with the second node is constructed by

repairing some flaw present in the plan associated with the first node). In typical refinement search

algorithms, the root node of the plan space graph is the empty plan containing just the initial state

description and the list of goals that together specify the planning problem. Nodes in the interior of the

graph correspond to partial plans and leaf nodes in the graph are identified with completed plans (solutions

to the planning problem) or plans that cannot be further refined due for instance, to inconsistencies within

the plans that the algorithm cannot resolve. In Mimesis, the initial planning problem for DPOCL is created

using the specifications of the current and goal states taken from the plan request from the client. The

approach to plan generation as search facilitates the creation of plans tailored not just to the particular state

of the application environment at planning time, but to preferences for certain types of action structure.

Search control rules can be defined that direct search towards (or away from) plans that use certain objects,

(29)

Plan Operator Library

The plan operator library contains declarative representations of actions that can be performed within the

application. This representation models under what circumstances an action can be executed and how the

action alters the world state – without explicitly stating how the action performs its tasks.

Operator ShowViewOptions (?ACTIVE-WINDOW ?EDIT-MENU ?VIEW-OPTIONS-MENU)

Preconditions:

(is-iTunes ?ACTIVE-WINDOW) (is-iTunes ?FULL-PLAYER) (is-visible ?EDIT-MENU)

(is-visible ?VIEW-OPTIONS-BUTTON) (is-enabled ?VIEW-OPTIONS-BUTTON) Effects:

(is-ViewOptions ?ACTIVE-WINDOW)

Figure 3.8. A DPOCL plan operator for displaying the View Options dialog used in the plan in Figure 3.7. In this operator, the Preconditions ensure that a) the current active window is the iTunes window b) iTunes is in full-player mode c) the Edit menu is visible, d) the View Options menu item is visible, e) the View Options menu item is enabled. The Effects of the action specify that the View options dialog box is open and is the currently active window.

The narrative planner in Mimesis uses a declarative representation in which an action is represented using

two main elements: its preconditions and its effects. An action’s preconditions are a set of predicates

describing those conditions of the application state that must hold in order for the action to execute

correctly. An action’s effects are a set of predicates capturing all changes to the world state made by the

action once it successfully executes. Figure 3.8 shows an example plan operator. A plan would consist of a

sequence of actions drawn from search of this plan operator library specified in the plan request.

3.3.2 Client side components

Goal Diagnoser

SmartAidè makes no assumptions about the goals of the users based on their interactions with the

application. Whenever the user requests help, the Help Dialog is displayed, where they can type in their

goal or the task they intend to achieve. The Goal Diagnoser translates the user input into planner

recognizable goal state. The Goal Diagnoser focuses on the global context of the user help request and

associates with a planner recognizable goal state using the definitions in the Action Library. In the current

version of the implementation, The Help Dialog displays a menu representing the set of actions modeled in

(30)

planner recognizable goal state. Future versions of the system will look at incorporating a more natural way

of interaction, when the user seeks help.

Context Recorder

Knowing what a user is doing within the application is a very crucial part of the help giving process. The

help system needs a way to identify and represent the effect of the user's actions on the interface, so that the

planner can get an accurate description of the current application environment state. In the Macintosh

environment, there are a finite number of 'objects', to which a user has access. When the application is

loaded up, the states of all the elements in the interface are recorded. As the user interacts with these

objects, the effects of these interactions are recorded and the state of the object itself is updated. In addition,

SmartAidè assumes that, for the application that it supports, there always exists an equivalent set of menu

actions for each hot-key sequence, which is initiated by the user. Thus, the Context-Recorder also translates

the hot-key sequence to the elements within the interface. The representation of the current state is always

stored in a format recognizable to the planner. The state of the application with respect to the environment

is recorded using the internal Macintosh representation, which stores information about active windows,

window sizes and their states (minimized or maximized).

Action Library

The Action Library stores information about the primitive actions that can be performed within the

application environment. The action library stores a pointer to the action class definition of the action and

the help text associated with the action. In order to ensure that every step in a plan can be executed within

the application, the developer must create one action class for every action operator in the plan library. The

implementation of each action class is responsible for preserving the semantics of the action operator

defined in the planner’s action library. To this end, the Mimesis client API’s abstract “Action” class defines

(31)

class ShowViewOptions extends Action;

var Window activeWindow; var int precondResult; var int effectsResult;

function int CheckPreconds() {

if activeWindow.window not “iTunes”

return 0;

if activeWindow.title.menuOption[1].enabled = 0

return 1;

if activeWindow.title.menuOption[1].options[9].enabled = 0

return 2;

else

return –1;

}

function int CheckEffects() {

if activeWindow.window not “View Options”

return 0;

else

return –1;

}

function void Body() {

activeWindow.title.menuOption[1].options[9].mouseClick(); set activeWindow.window = “View Options”;

}

state Executing { Begin:

precondResult = CheckPreconds(); if (precondResult != -1) {

reportPrecondFailure(precondResult); gotoState(‘Idle’); }

Body();

effectsResult = CheckEffects(); if (effectsResult != -1) {

reportEffectFailure(effectsResult); gotoState(‘Idle’); }

reportActionSuccess(); }

The Executing State is identical across all action classes, and so is typically defined in the top level Action class. It is included here for reference. The Executing code first checks the action’s preconditions. If all preconditions are met, then it runs the action’s body. Next, it checks the action’s effects. If all effects hold, then the action sends a message to the execution manager indicating that it has completed successfully. If an error is encountered along the way, the action sends an error report to the execution manager, facilitating re-planning.

Body() implements the operator’s behavior, changing the application state according to the intended meaning of the operator. CheckEffects() runs after the body of the action completes. It verifies each of the operator’s effects. When an effect does not hold in the current state, an integer identifying the failed effect is returned. If all effects hold, the function returns -1. CheckPreconds() checks each precondition from the corresponding operator in the order in which they are defined there. When a precondition does not hold in the current state, an integer identifying the failed precondition is returned. If all preconditions hold, the function returns -1.

Figure 3.9. An example of the ShowViewOptions action shown in the plan in Figure 3.8

An action’s CheckPreconds() function is responsible for verifying that the conditions described in the

(32)

is responsible for changing the state of the world in accordance with the meaning of the action operator.

The CheckEffects() function verifies that the conditions described in the operator’s effects have actually

been obtained in the world immediately after its execution.

The Executing() function is an abstract function defined only in the parent action class. This function,

shared by all action classes, first calls the action’s CheckPreconds(). If one of the action’s preconditions is

not met, then the Executing() function stops execution and sends a failure message to the Execution

Manager. Otherwise, the function calls the action’s Body() and then calls the action’s CheckEffects()

function. If one of the action’s effects does not hold, the function halts execution and reports this condition

to the Execution Manager. Otherwise, if no problems were encountered, the Executing() function reports

that the action has completed successfully. An example of an action class has been shown in Figure 3.9.

The Execution Manager initiates the execution of each action and extracts the associated help text for

display as the action is being executed.

Execution Manager

In order to control and monitor the order of execution for the actions within the narrative plan, the

Execution Manager builds a Directed Acyclic Graph (DAG) that represents the order in which the actions

need to be executed to achieve the user goal. The nodes in the DAG represent primitive actions and arcs

between two nodes indicate temporal dependencies between the two actions’ execution. An example

execution DAG for the plan from Figure 3.10 prior to any plan execution is shown in Figure 3.10 (a).

4

3

2

1

(a)

4

3

2

1

(b)

(33)

To execute the plan, the Execution Manager removes the minimal elements of the DAG (those actions that

have no temporal dependencies upon other actions) and sends an XML description of each action and its

parameters to the client on the user’s machine. Using a pre-defined translation scheme, the client translates

the action description into a function call with pointers to the appropriate system objects that make up the

action’s arguments.

This function is responsible for the correct implementation of the action characterized by the declarative

representation used by the Mimesis planner. As each of the client-side actions execute, they first check the

relevant system state to verify that their preconditions hold. Next, they perform their main task, changing

the world according to their intended semantics. Finally, they re-check their post-conditions, validating

that their execution has correctly changed the system state. If an error is encountered in this process (e.g.,

the user has altered the system state so that one of the action’s pre-conditions no longer holds at the time

the action is executed), an error message is sent by the function back to the Execution Manager. Should the

function execute correctly, a termination message is sent back to the Execution Manager, indicating that the

temporal dependencies in the execution DAG can be updated and new actions initiated for execution.

3.4. Design Decisions

3.4.1 Integrated, Separated or Divorced

Researchers have further classified IHS as being one of three types, namely, divorced, separated, or

integrated, with regard to the application for which help is intended. A divorced system knows nothing

about the application for which it is giving help. Conventional help systems, such as, those seen in today's

Windows applications can be considered as divorced [38]. The systems give help when asked for, but the

help is generally large chunks of pre-stored text. No user adaptation of the system and modification of help

take place. Separated help systems are similar to integrated help systems, but are not built into the

application. Communication between application and help system can occur and the system can adapt to a

user, because it has knowledge about user's actions [38]. With this adaptation comes modification of help

text, and some form of intelligence can be associated with its actions.

Integrated help systems provide perhaps the best answer for building a help system [38]. The help system is

part of the application and can therefore have total access to all the knowledge it needs to adapt to a user

and provide the best help. In this project the goal is to be able to make the architecture either separated or

integrated based on the decision of the designer. SmartAidè’s dual action representation model allows for