Investigating five key predictive text entry with combined distance and keystroke modelling

(1)

Investigating five key predictive text entry

with combined distance and keystroke

modelling

Mark D. Dunlop and Michelle Montgomery Masters

Computer and Information Sciences University of Strathclyde,

Richmond St, Glasgow G1 1XH, Scotland

Abstract This paper investigates text entry on mobile devices using only five-key. Primarily to support text entry on smaller devices than mobile phones, this method can also be used to maximise screen space on mobile phones. Reported combined Fitt's law and keystroke modelling predicts similar performance with bigram prediction using a five-key keypad as currently achieved on standard mobile phones using unigram prediction. User studies reported here show similar user performance on five-key pads as found elsewhere for novice nine-key pad users.

Keywords: predictive text entry user modelling

Introduction

Text entry is vitally important to many applications that are becoming common on

mobile devices, for example text messaging, instant messaging and email. The

industry has reacted to this requirement by providing devices based around either

full or half-qwerty keyboards to improve text entry (see figure 1). However, these

devices either have to compromise the overall size of the device or the screen size

in order to make space for the larger keypads. In parallel, increased data services

and the movement of traditional PDA functionality onto phones, puts pressure on

devices to have larger screens at a time when retail markets still show strong

pressure for smaller overall device sizes. This paper aims to address these

(2)

Figure 1: Palm QWERTY and Blackberry half-QWERTY keypads

Text entry methods for mobile phones

Most mobile phones still adhere to the ISO standard 9-key layout for core text

entry (see figure 2) with letters spread over eight keys plus a space key. This

results in an ambiguity problem: for example if the user types 2, the phone does

not know if she wishes an A, B, or C. Traditionally the user manually

disambiguated each keystroke using the multitap standard, users pressed keys

multiple times to achieve the letter they wished (e.g. pressing 2 would give A, 22

B, 222 C, with a timeout or escape key used to separate subsequent letters on the

same key). This is a slow and error prone form of disambiguation (e.g. [1, 2]),

often due to the delay around subsequent letters on the same key or missed triple

clicks on poor quality keypads. Predictive text entry methods attempt to provide

faster text entry by reducing the disambiguation to a word-by-word basis: the user

only presses one key per letter then selects from possible matching words after

each word. To reduce the effort of disambiguation further, words are presented in

decreasing likelihood of being the correct word. There are many techniques for

estimating these likelihoods with the most common being to present words that

match the ambiguous key sequence in decreasing order of frequency of use in the

expected language. This frequency-based approach, also known as unigram

approach, is implemented in most standard format mobile phones using

technology such as Tegic's T9 [3, 4].

Figure 2: Nokia 5110 ISO standard 9-key keypad

Primarily to support text entry on smaller devices than mobile phones, some

work has been conducted on text entry for smaller keypads using approaches such

as the date-stamp method (where users use either three-way or five-way joystick

to enter text on a letter by letter basis). Bellman and MacKenzie [5] investigated

dynamic rearrangements of the letters to reduce the distance from the start point

based on probabilistic modelling of the most likely next character, unfortunately

(3)

average entry speed of about 10wpm, rising to about 15wpm at best. Dunlop [6]

reported some initial work on text entry on a five-key pad using a watch-like

interface. This work used a variation of predictive text entry using only four soft

keys for letters plus a combined space/next key. This paper builds on that work to

investigate 5-key text entry on small physical keyboards that can be used to

optimize screen-space on a phone format device.

An alternative is the growing move away from physical keypads to

touch-screens, which are more common on PDAs. MacKenzie et al. [7] compared the

three most common input techniques for stylus based text entry on touch screen

devices: small on-touch-screen keyboards either in QWERTY or alphabetic layout

and hand-printing of letters. Their results showed that a standard QWERTY

layout can achieve around 23wpm while hand-printing achieved only 17wpm (and

alphabetic soft-keyboard only 13wpm). While the on-screen QWERTY pad

achieves decent text entry rates, it has been shown that users can achieve similar

rates using standard 9-key pads [2] while not requiring users to use a stylus to

operate the device.

Evaluation of mobile text entry

The evaluation of new text entry devices is complex: users take considerable time

to reach expert user performance and the nature of the devices makes prototyping

difficult. One solution is to use model-based evaluation techniques early in the

design to predict performance and complement successful approaches with

usability studies later in the process. There are two main schools of interaction

modelling for text entry on mobile devices: movement modelling based around

Fitt's law [8] for time to press buttons and keystroke modelling of the user action

sequence [9]. Both have their advantages and a combined model is used here to

investigate five key text entry.

Structure of this paper

The paper first discusses our proposed design for text entry using five-keys. We

then present performance analysis of how well a text-entry engine can

disambiguate five-key input compared to nine-key input. These disambiguation

(4)

predicted user performance between five-key and nine-key ambiguous text entry.

We introduce a combined model of performance estimation based on keystroke

modelling with key timing information from Fitt's law derived movement models

and use this model to revise our predictions. Finally, we report on a set of user

trials of a prototype five-key system to investigate novice user performance.

Design

The two main motivations behind the development of a five-key text entry method

for mobile devices are to (a) reduce the space used on devices by the keypad, both

on current phones to increase screen space and on other smaller devices to permit

text entry, and (b) to potentially increase speed by reducing finger movement. Our

design is based around a five-key ambiguous keypad with four alphabetic keys

and a combined space/next key, and is similar to that developed for watch-top

touch screens [6]. Users press one key per letter to enter a word followed by space

– the first press of the space key after a word inserts a space (highlighted as _ on

the interface) with subsequent presses of the space key rotating round matching

suggestions (in a similar fashion to the next key on many T9 phones). To aid users

a small area of the screen is used to give a guide to which letters are on which

keys, using visual clues to guide text entry [10] this area dims letters that are not

possible given the letters already entered (thus helping users to spot letters they

are looking for).

Figure 3 shows a paper prototype of the ideal implementation of our

interface on a phone size device: five entry keys above two soft-keys and arranged

around a 5-way joystick at the top of a phone with a large screen (using top

section of screen for key-guide while entering text). This shows how the new

interface can be used to maximise screen space, giving a screen area of

(5)

[image:5.595.179.283.31.177.2]

Figure 3: paper prototype of ideal 5-key text phone

Figure 4 shows the current J2ME prototype implementation as used in user

experiments discussed later. This prototype uses only the top part of a nine-key

traditional phone pad with quarter of the alphabet allocated to each of buttons 1,

4, 3 and 6 with 2 acting as combined space/next key. Figure 4 shows the prototype

interface for a user entering the phrase "hello how are you" – note the lower-right

(6) button on the key-guide showing that only u is valid as the next letter on that

key.

[image:5.595.116.345.350.496.2]

Figure 4: current implementation

While the experiments reported here focus only on alphabetic text entry, we

envisage the on-screen display being used to support modal entry of punctuation

and numbers (chords could be used for common punctuation, e.g. bottom two

keys for a period, while a joystick controlled keyboard could be used for

out-of-dictionary word entry and for numeric entry). One of our design aims was to

maintain the joystick and soft-keys for editing and application control as normal

but these could be used in conjunction with text entry keys as modifiers.

With ambiguous keypads the layout of keys affects the performance of the

(6)

common words such as on, in and an would all have the same key sequence.

Attempts at optimised keypads (e.g. [6, 11]) have, however, shown only a

marginal increase in theoretical performance over alphabetic arrangement of keys

and much slower user input due to time spent searching for letters on, to a user’s

viewpoint, randomly located keys. Thus, the experiments reported here have all

been conducted with an alphabetic arrangement of keys shown in figure 4. This is

actually slightly sub-optimal compared to the best alphabetically constrained key

arrangement, which is collection dependent [11]. We estimate that the results

reported below would be marginally affected by changes to optimal alphabetic

keyboards and improved by around 2% for fully unconstrained optimised

keyboard layout.

Performance analysis

Standard predictive text entry models (including Tegic's T9 and Dunlop &

Crossan's [1]) are based on unigram models of prediction: words are suggested

based purely on frequency of occurrence information for words that match the

number sequence entered. With predictive text on a standard 9-key pad, on

entering 4663 a user is presented with words that can be composed from GHI as

the first letter followed by MNO and MNO then DEF – these are typically

presented in descending order of frequency in English usage as expected on a

mobile phone. For example a standard T9 enabled phone will suggest good, home,

gone, hood, hoof etc. More advanced models can adjust predictions to make them

more likely to be correct, in particular bigram models [12, 13] bias predictions

based on the previous word – so that ranking is based on frequency of occurrences

of word wn following word wm, for example home would be more likely than good

after the word at. To assess the likely impact of reducing the number of keys from

nine to five, we calculated best case unigram and bigram predictions using two

corpora: a sub-set of The Herald collection used in Dunlop & Crossan [1] and the

Singapore SMS corpus [14]. To simplify analysis and focus on core text-entry

speeds, both collections were pre-processed to leave only alphabetic characters

separated by spaces and newlines.

In both cases the average ranked-list position (ARP) was calculated by

(7)

averaging the position of the expected word in suggested ranked list given the

ambiguous key-coding of the expected word. An ARP value of 1.0 would indicate

that the required word was always in the first position in the ranked list of

suggestions, a value of 2.0 that on average the required word was second in the

ranked list. This approach to averaging naturally biases the averaging process so

that words are taken into account proportionally to their occurrence in the text

collection. For unigram prediction all words were ranked in decreasing frequency

for each key combination, for bigram the same ranking was used but based solely

on frequency of the word occurring after the previous word in the sentence. As an

example, given the phrase "are you home" in the text collection the word home is

converted to its numeric equivalent (4663). A ranked list is calculated to predict

suggestions for 4663 with the position of home in that list taken as the ranked-list

position for that word (position 2 in the case of T9’s standard offer list). The

suggestions for 4663 are based on the frequency of all words matching 4663 for

the unigram model while the bigram model bases the suggestions on frequencies

of words occurring after you. This approach gives a good approximation to the

performance expected in-use once a prediction engine is tuned to the language.

However, this approach to modelling avoids two common problems with

predictive text entry: out of dictionary words for both approaches and sparse

frequency information for bigram models. Predictive text-entry models can only

predict known words and alternative input techniques are required for

out-of-dictionary words, which are normally stored in the user out-of-dictionary – so need only

be entered once per device. Bigram models rely on knowing considerable

statistical information about words – for rare words some combinations may

simply not have been seen before, even though the individual words have. There

has been considerable work (e.g. [15, 16]) on adjusting bigram models to,

essentially, degrade gently to unigram models when either there is no bigram

statistics for the word-code combination or the confidence of those statistics is

low. Experiments reported here use a simple bigram model, as the training method

ensures usable statistics.

The literature commonly uses three different methods for reporting the

performance of text entry methods. Above we have used average ranked list

(8)

per character (KSPC). Disambiguation accuracy (e.g. [11]) reports the percentage

of times the first word suggested by the disambiguation engine is the word the

user intended – a DA value of 100% implies the disambiguation engine always

give the correct word first, while 50% indicates that it only manages to give the

correct word first half of the time. This is an intuitive and very direct measure but

does not take into account the performance of words that do not come first in the

list. KSPC (e.g. [6]) reports the average number of keystrokes required to enter a

character, for example home on a standard T9 mobile phone requires 5 keystrokes

– 4663*, where * is the next suggestion key, giving a KSPC for that word of

5/4=1.25. A KSPC value of 1.0 indicates perfect disambiguation as the user never

needs to type any additional letters, while a higher figure reflects the proportional

need for the next key in disambiguation. KSPC does take into account ranked list

position for all words and compares well with non-predictive models, however it

is a rather abstract measure being based on letters for inherently word-based

methods. Given a standard word length of 5 letters per word (including space),

ARP = KSPCx5-4. All methods are normally averaged over a large corpus.

Based on the first million lines of The Herald collection (9 141 467 words

with average of 4.73 letters per word excluding space), table 1 shows the ARP,

DA and KSPC values for unigram and bigram prediction on both five-key and

nine-key keypads. This shows that (a) nine-key text disambiguation is

considerably more accurate that five-key; (b) that disambiguation is considerably

more accurate with bigram modelling and (c) that bigram modelling improves

five-key entry proportionally more than nine-key entry.

Herald Collection

5-key 9-key

unigram 1.554 ARP

80% DA

1.097 KSPC

1.058 ARP

96% DA

1.010 KSPC

bigram 1.148 ARP

92% DA

1.026 KSPC

1.017 ARP

99% DA

[image:8.595.118.340.516.686.2]

1.003 KSPC

(9)

Table 2 shows the same experiment and results pattern for the Singapore

SMS corpus [14] (121 126 words with an average 3.49 letters per word ex-space).

SMS Collection

5-key 9-key

unigram

1.832 ARP

66% DA

1.185 KSPC

1.135 ARP

90% DA

1.030 KSPC

bigram

1.199 ARP

87% DA

1.044 KSPC

1.028 ARP

98% DA

[image:9.595.118.340.92.261.2]

1.006 KSPC

Table 2: average ranked list position of required word in Singapore SMS collection For comparison: full-sized non-ambiguous keyboards achieve KSPC=1.00,

standard date-stamp method for entering text on 3 keys achieves KSPC=6.45,

date-stamp like interaction on 5 keys achieves KSPC=3.13 and multitap on a

standard 9-key mobile phone achieves KSPC=2.03 [17]. Gong and Tarasewich

[11] reported DA for 5-key and 9-key at 85% and 97% respectively for written

English corpus and 69% and 92% for SMS messages. Hasselgren et al. [12] used

bigram modelling with word completion, where words were suggested before the

user had finished entering them (leading to sub-1.0 KSPC figures [17]). Their

results report KSPC of 1.01 and 1.08 for T9 using Swedish news and SMS

corpora respectively, improving to 1.01 and 0.88 respectively for their bigram

model with word completion suggestions. As a comparison for Hallegran et al.'s

work, unigram word completion has been estimated to reduce KSPC by around

25% for a 9-key keypad (but to lead to potential cognitive load problems) [1].

Keystroke level modelling (KLM)

To gain an insight into potential expert user behaviour with different keyboards,

different approaches have been taken to modelling interaction in order to predict

expert (trained, error-free) performance. Dunlop and Crossan [1] proposed a

model based on Card, Moran and Newall's keystroke modelling [9]. The model is

based on predicting the time T(P) taken by an expert user to enter a given phrase

(10)

restrictions are clearly severe limitations on this modelling approach. However,

while more complex modelling approaches have been researched to support

novices, model more complete interaction and model error behaviour (e.g. [18,

19]), we believe that this relatively simple performance figure complements user

studies and is a reasonably accurate and worthwhile estimate of expected peak

performance of a text entry system.

The keystroke model is based on building an equation to represent the user

activity by summing a set of small time measurements, in the case of text entry

the appropriate times are: the homing time for the user to settle on the keyboard

Th; the time it takes a user to press a key Tk; and the time it takes the user to

mentally respond to a system action Tm. Dunlop and Crossan [1] modelled

predictive text entry on a sentence where disambiguation occurs as extra

characters representing moving down the ranked list of suggestions, their overall

time equation is as follows:

T(P) = Th + w (kwTk + l(Tm + Tk))

Equation 1: Dunlop and Crossan's KLM model

In equation 1, w represents the number of words in the phrase, kw is the

number of letters per word (renamed here from their kp) and l the ARP measure

(average position in the suggested word list of the correct word). In that paper the

calculation led to a predicted speed of 17.7 words per minute (wpm). Pavlovych

& Stuerzlinger [18] later modified the calculation by correcting double counting

of the first space key after a word by changing kw to 4.98, leading to a predicted

speed of 19.3 wpm1. Table 3 shows details of the calculation for both sets of data.

1

The figure of 19.3 wpm represents our revision of Dunlop and Crossan using Pavlovych and Stuerzlinger's

(11)

Dunlop &

Crossan

As revised by

Pavlovych &

Stuerzlinger

Th 0.4 0.4

w 10 10

kw 5.98 4.98

Tk 0.28 0.28

l 1.03 1.03

Tm 1.35 1.35

T10 33.9s 31.2s

[image:11.595.129.334.30.271.2]

speed 17.7wpm 19.3wpm

Table 3: nine-key unigram prediction time and speed

James and Reischel [2] carried out focused user experiments on both novice

and experienced users of T9. They reported a measured T9 performance averaging

at 20.4 wpm for experts, within 10% of the keystroke level modelling prediction

from table 3. For comparison handwriting with ink and paper achieves around 20

to 30 words per minute while desktop typists can achieve in excess of 150 wpm,

though this drops considerably for non-secretary users and for composition rather

than transcription (e.g. Karat et al [20] found speeds of 33wpm transcription

dropping to 19wpm for composition).

Simple unigram and bigram modelling

Equation 1 has constant values for Th, w, Tk and Tm with other values being

dependent on the disambiguation engine and the text collection in use. Based on

the performance analysis reported above, table 4 shows the predicted

words-per-minute for unigram and bigram predictions using the two test collections, while

table 5 gives more details on the parameters of equation 1. This shows that

moving to bigram prediction improves performance by 2-6% for a nine-key

keypad and 20-34% for the five-key keypad. Furthermore, the improvement in

moving to bigram prediction brings the five-key keypad from around 26-40%

(12)

9-key

wpm

5-key

wpm

herald unigram 19.42 15.39

herald bigram 19.85 18.54

sms unigram 20.93 14.99

[image:12.595.147.314.31.156.2]

sms bigram 22.28 20.19

Table 4: Summary KLM model predicted results

kw l T10 (s) wpm

unigram 4.73 1.058 30.89 19.42

9-key

Herald bigram 4.73 1.017 30.22 19.85

unigram 3.49 1.135 28.67 20.93

9-key

SMS bigram 3.49 1.028 26.93 22.28

unigram 4.73 1.554 38.97 15.39

5-key

Herald bigram 4.73 1.148 32.36 18.54

unigram 3.49 1.832 40.03 14.99

5-key

SMS bigram 3.49 1.199 29.72 20.19

Table 5: Variable KLM model parameters and predicted results

Adjusting model for variable keystroke times

Dunlop and Crossan modelled keystroke speed at 0.28s based on a fixed figure

from Card et al.'s figure of equivalent to "an average non-secretary typist" on a

full QWERTY keypad [9]. As supported by James and Reischel [2], equation 1

works well, however it cannot take into account one of the main motivations for

the five-key keypad: reduced finger movement decreasing the time taken to press

each key. Mackenzie's group have conducted considerable work on using Fitt's

law [8] for calculating the limit of performance given distance between keys and a

language model for the movement required between keys to enter text with a

given text entry scheme (e.g. [21]). In the basic form their modelling predicts

40.6 wpm for thumb-based T9 input assuming no next key operations, with 5

characters per word this equates to an average keying time of 0.30s (without

[image:12.595.29.433.202.386.2]

(13)

Based on the same phone model used in their paper (a Nokia 5110, figure

4), we have examined the total time for entering a large block of text according to

their model using both nine-key and five-key keypads. Our results give a

weighted average time per key of 0.26s for the nine-key pad and 0.22s for the

five-key pad (the slight difference being due to corpus effects – the movement

models are derived from and conditional upon analysis of a large corpus of text).

Feeding these figures directly into equation 1 (replacing Tk as appropriate in table

5) gives the words per minute predictions in table 6.

9-key

wpm

5-key

wpm

herald unigram 20.18 17.04

herald bigram 20.64 20.81

sms unigram 21.62 16.29

[image:13.595.145.317.191.325.2]

sms bigram 23.05 22.30

Table 6:revised KLM models adjusting for distance calculations

Table 6 shows that the five-key keypad is predicted to perform roughly

equivalently to a nine-key keypad when bi-gram prediction is used and reduced

keying time is taken into account (1% better on The Herald collection, 3% slower

on the SMS collection). Furthermore, five-key performance with bigram

prediction is predicted to marginally out perform standard unigram prediction on

nine-key keypads for both test collections.

Improving these models

The distance model above, while taking into account some aspects of physical

keyboard layout, has two inaccuracies that can noticeably affect predictions:

repeat keys and parallel finger movements. Repeat keys, where subsequent letters

are on the same key, are not modelled correctly with Fitt's law, which tends to

underestimate zero distance movements. Soukoreff and MacKenzie [22]

conducted user experiments on a modified Fitt's law model that includes a

separate estimated time for repeat keys. The standard Fitt's law model also does

not take into account one of the main speed gains of touch typing: being able to

move fingers on one hand in preparation while still typing with the other hand.

(14)

entry where one thumb can be moving to the appropriate key in parallel with the

key press on the other thumb [23].

We recalculated our predictions based on modelling both repeat-key timing

and two-thumbed parallel movements. Again we simplified the models from the

distance papers by using them simply to estimate the average key-stroking times

for our KLM modelling. Assuming users are using two-thumbs to enter text, our

initial modelling gave key times of 0.27s for The Herald collection with a 9-key

keypad and 0.25s for the SMS collection, reducing to 0.23s and 0.22s respectively

on the 5-key keypad. Table 7 shows the key stroke time for the different

modelling and collection combinations for 9-key pad and the resulting predicted

text entry speed for both unigram and bigram prediction. Table 8 shows the same

data for 5-key pad.

Tk unigram bigram

simple 0.27 19.9 20.4

repeat-key 0.25 20.9 21.4

two-thumb 0.18 23.8 24.4

Herald

both 0.17 24.1 24.7

simple 0.25 22.0 23.4

repeat-key 0.24 22.9 24.5

two-thumb 0.18 25.0 26.9

SMS

[image:14.595.91.373.293.495.2]

both 0.17 25.2 27.1

(15)

Tk unigram bigram

simple 0.23 16.7 20.4

repeat-key 0.22 17.3 21.1

two-thumb 0.16 19.1 23.7

Herald

complex 0.15 19.0 23.7

simple 0.22 16.3 22.3

repeat-key 0.21 16.7 23.0

two-thumb 0.15 18.0 25.2

SMS

[image:15.595.94.369.30.231.2]

complex 0.15 17.8 25.0

Table 8: Predicted words-per-minute for 5-key pad using different prediction models These tables show that, averaged over the two collections, our initial simple

predictions show a drop of around 21% in words per minute when moving from a

9-key to a 5-key keypad using simple unigram word prediction. This difference

drops to only 3% when using bigram prediction, with simple Fitt's law modelling

of keystroke times. When we take into account increased times for repeat-keys

and reduced times when swapping thumbs the predicted performance drop for the

5-key keypad increases to 6% when comparing bigram prediction on 5-key and

9-key pads. However, bigram prediction on 5-9-key pads is still only 2% slower

overall than unigram prediction on 9-key pads, showing predicted 5-key

performance roughly in-line with the prevalent current mobile phone text entry

approach.

User trials

Keystroke level modelling is useful for predicting expert user performance but is

focussed on expert error-free performance and gives little feedback on how new

users find a technology to use. To address this we conducted a controlled usability

experiment with users entering text phrases in a controlled setting.

Experimental Equipment and Users

A set of sentences were collected from members of the department who regularly

send text messages. They were asked to write some messages that were in the

(16)

any SMS shorthand (e.g. ur converted to your), remove punctuation marks,

convert numbers into their written form (e.g. replacing 8 with eight) and remove

capitalisation. Although not natural these edits were consistent between test

interfaces, more closely match the earlier technical studies and can be justified as

gaining an insight into peak text entry rates that are not affected by modal

changes. The final 20 sentences were randomly shuffled to give the trial test

phrases.

Twenty staff and students of the Computer and Information Sciences

Department at University of Strathclyde took part in the experiment (ages ranged

from 24-45, 6 female/14 male, 2 dominent-left-handed/18 right). Fifteen of these

users considered themselves to be regular senders of text messages, four rarely

sent messages and one user had never sent a text message. Fourteen of the

subjects regularly used T9 prediction when messaging.

Experiments were all conducted on a Sony Ericsson K300i handset running

a dedicated J2ME editor that was identical for both 5-key and 9-key text entry bar

the provision of an on-screen guide to key-mapping for 5-key entry (as shown in

Figure 4 and 5). Although not the ideal format for 5-key entry, we felt this was the

best experimental compromise between appropriateness and consistency. Both

versions of the text engine used the same small dictionary with a unigram

prediction model.

Experimental Process

The participants were asked to complete a short pre-experiment questionnaire on

demographic information and text message usage. The participants were then

given a demonstration of our application and our approach to nine-key text entry.

To recap, our approach differed from standard T9 in use of the space key to

double as next key. It also varied from some handsets in that only 0 and # were

supported as space keys (unfortunately ruling out the normal 1 key for users of

certain brands of handsets).

After the demonstration participants were asked to enter 10 sentences using

the nine-key layout on the mobile phone handset.

The participants were then given a demonstration of the five-key layout and

(17)

Finally, the participants were then asked to complete a feedback form to

indicate their preference between the keypads and give any comments on either

keypad layout.

Using a GPRS data connection, key-strokes were recorded each time the

space key was pressed. These records were time stamped in tenths of a second and

[image:17.595.136.325.157.302.2]

recorded the display status at that point.

Figure 5: five-key and nine-key prototypes as used in user study

Results

The results shown in figure 6 summarise the average text entry rate for users over

the last eight phrases (the first two are excluded to allow users to settle in with the

device and technique).

0 5 10 15 20 25 30

1 2 3 4 5 6 7 8

Task Order

WP

M

T9 MM5

Figure 6: user-averaged text entry speed over last eight phrases

The results show that, averaged over all users and tasks, nine-key entry was

approximately 21wpm while five-key is around 12wpm. James and Reischel [2]

[image:17.595.120.340.451.619.2]

(18)

depending on experience – in line with our studies, given our profile of users and

slight differences in prior experience. Furthermore our result for novice five-key

users is in line with their novice nine-key user speed of around 11 wpm. The

results also show a clear positive trend for 5-key usage compared to a steady

performance for 9-key entry.

In discussion with the users several participants stated that they could see

the benefit of having fewer keys to allow devices to be made smaller. Many found

that the key-guide for 5-key entry acted as an effective spelling guide since letters

were greyed out that would not create a possible word in the dictionary, although

some mentioned that they felt locked in by this spelling assistance. Furthermore,

several participants found that this display made it was easier to resume text entry

after a distraction – an aspect of the outcomes we are investigating further. Many

reported that they felt any difficulties in using 5-key entry would reduce over time

since they were trying to break pre-programmed 9-key and T9 habits and that they

felt they were performing better towards the end of the experiment. Interestingly,

despite requiring more keystrokes in 5-key entry, several participants mentioned

that they felt they had to make fewer "clicks" when using 5-key entry. We can

only conclude that this impression was due to less key-searching and/or finger

movement. Finally, some subjects mentioned that they did not like looking at the

screen. Continuous focus on the screen is an intended benefit of 5-key entry and

we feel the likely cause of this reaction was the use of the 9-key pad for

experiments, where users are used to checking the labels on keys.

Future work

Although often used while stationary, designing for mobile users puts new

challenges on the design on interfaces. [24] and [25] have studied the effect of

different sized on-screen buttons on walking speed and data-entry errors. The

five-key pad proposed here lends itself to reduced vision load compared to soft

keypads or printing, with physical keys rather than screen areas and expert users

only needing to check each word instead of each letter. However, this checking

needs to be done more carefully than with a 9-key pad with the same prediction

engine. Studies are planned to investigate 5-key text entry for walkers to assess

(19)

We also plan to conduct a longitudinal study of five-key text entry using a

fuller prototype system that will support full text entry (including numbers,

punctuation and out of dictionary words).

Conclusions

This paper reported our investigation into predictive text entry using five keys.

This work was motivated by three desires: to reduce the space taken up by

keyboards on mobile phones, to extend predictive text entry to other formats of

mobile devices and to reduce finger movement distances in the hope of improving

text entry speed.

An investigation into purely technical performance of both unigram and

bigram predictive text entry showed that five-key text entry was considerably

poorer than nine-key for unigram modelling for both written English and SMS

corpora. However, the quality of predictions was considerably increased and the

difference between keypads reduced when bigram modelling was used.

These figures were then used as the basis for keystroke level modelling of

users entering text. Simple keystroke modelling with fixed keying times, showed

a speed reduction of around 21% for five-key text entry. These models were then

refined to take into account the reduced finger movement times for five-key text

entry using methods derived from Fitt's law, this resulted in new predictions that

five-key text entry would be within approximately 6% of the performance of

nine-key when both are using bigram modelling. Furthermore, five-nine-key text entry using

bi-gram modelling was predicted to perform very close to the level of standard

nine-key unigram modelling.

Finally, users trials were conducted to gain an impression of non-experts

usage of five-key text entry. Using a prototype system on a mobile phone handset

with unigram prediction, users achieved approximately 12 words-per-minute

using five-key entry and 21 words-per-minute for nine-key. While much slower

on five-keys, these figures are in line with, respectively, novice and expert speeds

reported elsewhere for 9-key pads.

Through our combined keystroke and Fitt's law modelling we have

predicted that five-key text entry using bigram word prediction will perform

(20)

user trials have shown similar user performance using five-key to novice nine-key

users in other trials. Combined, while not achieving one of our aims of faster text

entry, we believe that these results show that five-key keypads can be used as a

replacement for current nine-key text entry without noticeable loss of text entry

speed.

Acknowledgements: Once again we extend our gratitude to our users in our user trials.

References

1. Dunlop, M.D. an d A. Crossan, Predictiv e text en try m ethods for m obile phones. Person al Techn ologies, 20 0 0 . 4_(2).

2. J am es, C.L. an d K.M. Reischel, Text in put for m obile dev ices: com parin g m odel

prediction to actual perform an ce, in SIGCH I conferen ce on H um an factors in com putin g

sy stem s. 20 0 1, ACM Press: Seattle, Washin gton , Un ited States.

3. Grover, D.L., M.T. Kin g, an d C.A. Kushler, R educed key board disam biguatin g com puter I. Tegic Com m un ication s, Editor. 1998 : USA.

4. Kushler, C., AAC Usin g a R educed Key board, in echnology and Persons w ith Disabilities

Con feren ce. 1998 : USA.

5. Bellm an , T. an d I.S. MacKen zie, A probabilistic character lay out strategy for m obile text

entry, in Graphics In terface '98. 1998 , Can adian In form ation Processin g Society: Toron to,

Can ada. p. 168 -176.

6. Dun lop, M.D., W atch-Top Text-En try : Can Phone-Sty le Predictiv e Text-En try W ork W ith

On ly 5 Button s?, in 6th International Conference on H um an Com puter In teraction w ith

M obile Dev ices an d Serv ices: M obileH CI 0 4, S.A. Brewster an d M.D. Dun lop, Editors.

20 0 4, Sprin ger Lecture Notes in Com puter Scien ce: Glasgow.

7. MacKen zie, I.S., et al., A com parison of three m ethods of character en try on pen -based

com puters, in Factors an d Ergon om ics Society 38 th An n ual M eetin g. 1994: San ta

Mon ica, USA. p. 330 -334.

8 . Fitts, P.M., The in form ation capacity of the hum an m otor sy stem in controlling the

am plitude of m ov em en t. J ourn al of Experim en tal Psychology, 1954. 4 7_{(6): p. 38 1-391.}

9. Card, S.K., T.P. Moran , an d A. Newell, The key stroke-lev el m odel for user

perform an cetim e w ith in teractiv e sy stem s. Com m un ication s of the ACM 198 0 . 2 3_{(7): p.}

396-410 .

10 . Magn ien , L., J .L. Bouraoui, an d N. Vigouroux, M obile Text In put w ith Soft Key boards:

Optim ization by M eans of Visual Clues, in 6th In tern ation al Sy m posium on M obile

H um an -Com puter In teraction – M obileH CI 20 0 4 20 0 4, Sprin ger LNCS Volum e 3160 /

20 0 4 Glasgow, Scotlan d.

11. Gon g, J . an d P. Tarasewich, Alphabetically con strain ed key pad design s for text en try on

m obile dev ices, in CH I '0 5: SIGCH I Conference on H um an Factors in Com putin g Sy stem s

20 0 5: Portlan d, USA.

12. H asselgren , J ., et al., H M S: A Predictiv e Text En try M ethod Usin g Bigram s, in W orkshop on Lan guage M odelin g for Text En try M ethods, 10 th Con feren ce of the European

Chapter of the Association of Com putation al Lin guistics. 20 0 3: Budapest, H un gary. p.

43-49.

13. Klarlun d, N. an d M. Riley, W ord n -gram s for cluster key boards, in W orkshop on

Lan guage M odellin g for Text En try M ethods, 11th Con feren ce of the European Chapter of

the Association for Com putational Linguistics 20 0 3, 51-58 .

14. H ow, Y. an d M.-Y. Kan , Optim izin g predictiv e text en try for short m essage serv ice on

m obile phon es, in H um an Com puter In terfaces In tern ation al (H CII 0 5). 20 0 5: Las Vegas,

USA.

15. Witten , I.H . an d T.C. Bell, The zero-frequen cy problem : Estim ating the probabilities of

n ov el ev en ts in adaptiv e text com pression . IEEE Tran saction s on In form ation Theory,

1991. 3 7_{: p. 10 8 5-10 94.}

16. Katz, S.M., Estim ation of probabilities from sparse data for the lan guage m odel

com pon en t of a speech recogn izer. IEEE Tran saction s on Acoustics, Speech an d Sign al

(21)

17. MacKen zie, I.S., KSPC (key strokes per character) as a characteristic of text en try

techn iques, in M obileH CI 0 2: Fourth In tern ation al Sy m posium on H um an -Com puter

In teraction w ith M obile Dev ices. 20 0 2, Sprin ger Berlin , Lecture Notes in Com puter

Scien ce: Pisa, Italy. p. 195-210 .

18 . Pavlovych, A. an d W. Stuerzlin ger, M odel for n on -expert text en try speed on 12-button

phone key pads, in CH I '0 4: SIGCH I Con feren ce on H um an Factors in Com putin g

Sy stem s. 20 0 4: Vien n a, Austria. p. 351-358 .

19. San dn es, F.E., Ev aluatin g m obile text en try strategies w ith fin ite state autom ata, in

M obileH CI 0 5: 7th in tern ation al Con feren ce on H um an Com puter in teraction w ith

M obile Dev ices an d Serv ices. 20 0 5: Salzburg, Austria.

20 . Karat, C., et al., Pattern s of En try an d Correction in Large Vocabulary Con tin uous Speech

R ecognition Sy stem s, in ACM CH I99: H um an Factors in Com putin g Sy stem s. 1999, 568

-575.

21. Silfverberg, M., I.S. MacKen zie, an d P. Korhon en , Predictin g text en try speed on m obile

phones, in SIGCH I Con feren ce on H um an Factors in Com puting Sy stem s (CH I'0 0 ). 20 0 0 ,

ACM Press: The H ague, The Netherlan ds.

22. Soukoreff, R.W. an d I.S. MacKen zie. Usin g Fitts' law to m odel key repeat tim e in text

en try m odels. in Graphics In terface 20 0 2. 20 0 2.

23. MacKen zie, I.S. an d R.W. Soukoreff, A m odel of tw o-thum b text en try, in Graphics

In terface 20 0 2. 20 0 2.

24. Brewster, S.A., Ov ercom in g the Lack of Screen Space on M obile Com puters. Person al an d Ubiquitous Com putin g, 20 0 2. 6_{(3): p. 18 8 -20 5.}

25. Mizobuchi, S., M. Chign ell, an d D. Newton , M obile text en try : relation ship betw een

w alkin g speed an d text in put task difficulty, in M obileH CI 0 5: 7th in tern ation al

Con feren ce on H um an Com puter in teraction w ith M obile Dev ices and Serv ices 20 0 5: