THE LEARNING SETTING - Predictive Menu on a Mobile Phone

Predictive Menu on a Mobile Phone

3.3. THE LEARNING SETTING

Table 3.1: Notation needed for describing the menu prediction learning setting

Symbol Definition/Explanation

X the set of examples, where each example is represented by m attributes.

Y the set of n labels, {1, ..., n}, where each label represents the position or index of a menu item in a menu.

A the set of labelled examples, where A =X × Y.

F the set of functions of the form f :X → Y.

H the set of hypotheses.

C the set of concepts.

T the time index that parameterises X,Y, and F

be predictive of the menu selection. The values that these attributes take on represent an example to the learner.

On every menu selection made by the user, the learner receives an example, xt ∈ X, and the menu item selected by the user, yt ∈ Y. Note,

the subscript t is used as the index for time parameterisation, whereas,

i and j are used to denote non-time indexes. Together, the pair (xt, yt)

form a labelled example at time t. After t+ 1 menu selections the learner will have observed a sequence of labelled examples called a sample, i.e.

samplet= ((x0, y0),(x1, y1), ...,(xt, yt)).

The objective of a learning system is to predict which menu item to highlight yt ∈ Y when given some example xt ∈ X and a sample up until

time t, samplet−1. This is achieved by discovering a function ft : X → Y

based on the samplet−1 that is then used to predict the menu selection at

time t. If the prediction is successful, then yt =ft(xt).

It is also assumed that the attributes describing an example would be discretely valued and that the values each attribute could take would be rel- atively small and knowna priori, ensuring that the setX is finite. Given that both the setX andY are finite, the set of functionsF is also finite. However, in the menu prediction setting not all functions are considered equally likely to occur. Moreover, some functions inF represent highly unlikely mappings. To take advantage of this prior knowledge within the problem domain, only those functions in F that were considered likely to occur were provided to

the learner. These likely functions represent a hypothesis space H (where

H ⊂ F) which is provided to the learner.

Two characteristics dictated the representational scheme used to specify hypotheses. Firstly, hypotheses needed to be amenable to efficient computa- tional evaluation, and secondly, it was desirable that they be comprehensible to a user. Representing hypotheses as decision lists accommodates both requirements. The form of decision list chosen is shown below.

“ifl1 theny1, else ifl2 then y2, ..., else yn” where each li is a CNF expression

and eachyi ∈ Y.

The decision list representation also ensured that each hypothesis was a function.

If a hypothesisht∈ Hcorrectly predictsyt∈ Y when evaluated onxt∈ X

then ht is said to agree with the labelled example (xt, yt). If a hypothesis

ht ∈ H predicts incorrectly, then ht is said to disagree with the labelled

example (xt, yt). A hypothesis is considered consistent if it agrees with all

labelled examples in a sample.

3.3.1 Loss Model

When the learner selects a hypothesis that disagrees with an example, the learner has in effect highlighted a menu item the user does not want to select. The user must then scroll to the correct menu item. The cost associated with predicting the wrong menu item can be measured in a number of ways. One way is to determine the number of scrolling key presses a wrongly highlighted menu item would cause the user. If each yt ∈ Y is an index into a list of

menu items, then an absolute loss function captures this measurement, i.e.

L(ht(xt), yt) =|ht(xt)−yt|. (3.1)

The absolute loss function allows the notion that a wrongly highlighted menu item located further away from the intended menu item results in a larger cost to the user. However, if the user is only concerned as to whether the correct menu item was highlighted, then the discrete loss function is suitable, i.e:

3.3. THE LEARNING SETTING 37

L(ht(xt), yt) =

(

0, if ht(xt) =yt

1, otherwise. (3.2)

A discrete loss function also allows the menu-item prediction problem to be viewed as a binary-concept learning problem. The labelled example (xt, yt) can be viewed as a possible “course-of-action” for the learner. From

this perspective, the discrete loss function determines whether a “course- of-action” is appropriate. Given some hypothesis ht and some “course-of-

action” (xt, yt), the discrete loss function defines a boolean-valued concept,

i.e. “(xt, yt) is an appropriate course-of-action givenht”. More formally, the

discrete loss function is a characteristic function on each ht ∈ H. We let a

concept ct be the extension of a hypothesis ht, i.e ct = ext(ht) and let C be

the set of possible concepts.

An assumption was made that any observed “course-of-action” (xt, yt)

agreed with at least one hypothesis. In other words, for any (xt, yt) observed,

there existed a hypothesis such thatht(xt) = yt. Therefore, the set of possible

concepts was restricted such that C ⊆ H.

In the setting just described, the learning problem becomes one in which the learner must determine which “course-of-action” is appropriate (appropriate being defined as a “course-of-action” that produces a loss of 0). For the learner to determine which “course-of-action” is appropriate, it needs to know which h ∈ H is being used to assess “appropriateness”. In other words, the learner needs to find the hypothesis which captures the concept of “appropriateness”. This concept is called the target concept and is denoted asc∗_.

3.3.2 On-line Model

For menu-item prediction to be useful it needs to operate every time a user opens a menu in an interface. As such, the learning approach for menu-item prediction must operate over a series of trials. In this regard, the learning approach for menu-item prediction operates in an on-line and supervised

setting. A general model of this type of on-line and supervised prediction setting is provided by Littlestone [72].

The objective of the learner in this on-line and supervised prediction setting is to reduce overall loss by improving its predictive accuracy after each trial. Since accuracy relies on finding the target concept, the learner must find the hypothesis that captures the target concept after as few trials as possible.

3.3.3 Concept-drift Model

If a hypothesis agrees with every example in a sample (a consistent hypothesis), then the hypothesis provides an accurate mapping between some set of attribute values and the menu item a user will select in the sample. In this case, the hypothesis captures the target concept in the sample. In the menu prediction setting, the target concept can be considered a model of how a user interacts with a menu. Finding a hypothesis that captures the target concept is similar to determining how a user interacts with a menu.

It is conceivable that the model by which a user interacts with a menu may change over time. There are a number of reason why such a model may change: a different person may become the user, a user may become more adept at using the interface, the user may operate the device in a different way, or the environment may not provide enough information for which to build an accurate model in the first place. The result of a changing model of user interaction is a changing target concept. For instance, at some trialt

the user selects menu itemytgiven the attribute valuesxt. However, at some

later trial t+d, the user selects a different menu item yt+d 6=yt when given

the exact same attribute values xt+d = xt (the target concept thus changed

at some point from trial t to trialt+d).

Although there may not be a single global target concept, it may be that a set of local concepts can be composed to form a global target concept, where each local concept is only relevant for a given period of time.

In the menu prediction setting it was assumed that the context was defined by the task the user was performing. For instance, while the user was performing one task their interactions could be captured by the hypothesis

hi. When the user changed tasks their interactions could then be captured

3.3. THE LEARNING SETTING 39

would change whenever the user changed tasks.

Figure 3.2(a) shows two local concepts, each represented by a different hypothesis. Each concept becomes the target concept only in a given context. It was assumed that the context was determined by some attribute not present in the example space. Figure 3.2(b) shows concept drift in an on-line setting. In this case, the context is inherently time dependent.

(a) (b) Examples Space Examples (hypothesis h_j) (hypothesis h_i) (hypothesis h_i) (hypothesis h _j)

Figure 3.2: (a) Two local concepts, hypothesis hi is the target concept in one context

and hypothesishj the target concept in another context. (b) Concept drift in an on-line

setting, in which the context is time dependent.

We define a simple model to describe how the target concept changes. We assume that at any trial t there is target concept ct. We assume that

with probability λ the target concept changes from one trial to the next, whereas with probability 1−λwe assume the target concept does not change. When the target concept does change, the current target concept cannot be reselected, although it may be reselected in latter trials. Finally, it is assumed that λ remains stationary over all trials.

If concept drift occurs in an on-line learning model, the learner has to find the hypothesis that captures the target concept on the current trial. The learner has available to it only previously observed examples.

In document Adaptive User Interfaces for Mobile Computing Devices (Page 35-40)