Q
UANTIFYING
T
EXT
E
NTRY
P
ERFORMANCE
R
OBERTW
ILLIAMS
OUKOREFFA Dissertation Submitted To The Faculty Of Graduate Studies In Partial Fulfilment Of The Requirements
For The Degree Of
D
OCTORO
FP
HILOSOPHYGraduate Program In Computer Science York University
Toronto, Ontario, Canada
Q
UANTIFYINGT
EXTE
NTRYP
ERFORMANCE by Robert William Soukoreffa dissertation submitted to the Faculty of Graduate Studies of York University in partial fulfilment of the requirements for the degree of
DOCTOR OF PHILOSOPHY
© 2010
Permission has been granted to: a) YORK UNIVERSITY LIBRARIES to lend or sell copies of this dissertation in paper, microform or electronic formats, and b) LIBRARY AND ARCHIVES CANADA to reproduce, lend, distribute, or sell copies of this dissertation anywhere in the world in microform, paper or electronic formats and to authorize or procure the reproduction, loan, distribution or sale of copies of this dissertation anywhere in the world in microform, paper or electronic formats.
The author reserves other publication rights, and neither the dissertation nor extensive extracts from it may be printed or otherwise reproduced without the author’s written permission.
Abstract
ANINFORMATICMODEL
OFHUMAN-TO-COMPUTERCOMMUNICATION
Robert William Soukoreff Advisor:
York University, April 2010 Professor I. S. MacKenzie
In this dissertation a methodology pertaining to empirical text entry studies is developed. An approach to measuring errors in text entry tasks (a generalisation of typing errors), founded upon the Minimum String Distance (MSD) string comparison algorithm is proposed. The benefit of this approach is that it allows researchers to empirically examine a much more representative form of text entry that includes the error correcting processes.
The MSD-based methodology is extended in several ways. A character-level error analysis scheme based upon the MSD is developed. A set of short English text phrases is provided, with the intention that standardising the set of presented text phrases will control a significant source of variability in text entry studies. The importance of the input stream (consisting of the set of action primitives employed by a person performing text entry) is acknowledged. The MSD-based error analysis methodology is further enhanced to include an analysis of the input stream, affording the researcher a deeper perspective of the text entry task under study.
The idea of examining text entry outside of the laboratory is considered, and a study of freeform text entry is performed.
The idea of using the information theoretic quantity throughput as a metric of human performance is considered. Estimated throughput figures calculated from
data collected in two studies supports the feasibility of employing throughput as a metric of human performance in text entry studies.
Finally, a rationale for the speed-accuracy trade-off is developed. It is argued that the speed-accuracy trade-off arises as a consequence of Shannon’s Fundamental Theorem for a Channel with Noise, when coupled with two additional postulates: that people are imperfect information processors, and that motivation is a necessary condition of the speed-accuracy trade-off.
Acknowledgements
There’s no way this thesis could ever have been completed without the inspiration, encouragement, criticism, and patience of my supervisor, Professor Scott MacKenzie. We have co-authored a number of publications together, and this was a direct result of his insatiable curiosity and infectious enthusiasm. Scott is an unending source of ideas to consider and problems to solve, and I sincerely hope that we will continue to find more mysteries to unravel together now that this is all over. Thank you Scott!
Thank you, also, my supervisory committee members, Professors Robert Allison and Wolfgang Stuerzlinger, who gave me helpful advice and criticism along the way.
I also appreciate the time and effort taken by my defence committee members in participating in my oral exam. Thank you, Professors Ravin Balakrishnan, Melanie Baljko, and Anne Moore.
Lastly I would like to acknowledge the support I received from my family. My lovely wife Lena has unfailingly been my number one cheerleader. She has read and commented upon everything I have written, and her encouragement, support, and sanity really helped during the dark days (every Ph.D. has some of those). Thanks goes to my parents Fred and Maureen, and my sisters Catharine and Alex, for their support over the many long years. I am especially grateful to my mother for proofreading and commenting upon drafts of this dissertation. I should also mention my little ones, Ilya and Vianne, who have all at the same time been the biggest impediment to my finishing, and strongest imperative to finish. I love you both.
Dissemination of this Thesis
The following chapters or parts thereof have been published as peer reviewed papers.
Chapter 3: Levenshtein’s Minimum String Distance – Automated Analysis of Text Entry Errors
Soukoreff R. W. & MacKenzie I. S. (2001). Measuring errors in text entry tasks: An application of the Levenshtein string distance statistic. Companion Proceedings of the ACM Conference on Human Factors in Computing Systems - CHI 2001. New York: ACM. 319-320.
Chapter 4: Character-level Error Analyses
MacKenzie I. S. & Soukoreff R. W. (2002b). A character-level error analysis technique for evaluating text entry methods. Proceedings of the Second Nordic Conference on Human-Computer Interaction - NordiCHI 2002. New York: ACM. 241-244.
Chapter 5: Phrase Set for the Evaluation of Text Entry Technologies
MacKenzie I. S. & Soukoreff R. W. (2003). Phrase sets for evaluating text entry techniques. Extended Abstracts of the ACM Conference on Human Factors in Computing Systems - CHI 2003. New York: ACM. 754-755.
Chapter 6: An Evaluation of MSD and KSPC, and a New Unified Error Metric Soukoreff, R. W. & MacKenzie I. S. (2003). Metrics for text entry research: An evaluation of MSD and KSPC, and a new unified error metric. Proceedings of the ACM Conference on Human Factors in Computing Systems - CHI 2003. New York: ACM. 113-120.
Chapter 7: Further Developments in Text Entry Research
Soukoreff R. W. & MacKenzie I. S. (2004). Recent developments in text-entry error rate measurement, Extended Abstracts of the ACM conference on Human Factors in Computing Systems - CHI 2004. New York: ACM. 1425-1428.
Chapter 8: Text Entry Behaviour Outside of the Laboratory
Soukoreff R. W. & MacKenzie I. S. (2003a). Input-based language modelling in the design of high performance text input techniques, Proceedings of Graphics Interface 2003. Toronto: Canadian
Information Processing Society. 89-96.
Chapter 11: An Informatic Rationale for the Speed-Accuracy Trade-Off
Soukoreff R. W. & MacKenzie I. S. (2009) An informatic rationale for the speed-accuracy trade-off. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics – SMC 2009. New York: IEEE. 2969-2975.
Table of Contents
Abstract ... iv
Acknowledgements ... vi
Dissemination of this Thesis ... vii
Table of Contents ... ix
List of Tables ...xiii
List of Figures ... xiv
List of Pseudo-Code Listings...xviii
List of Equations ... xix
Chapter 1 Introduction...1
1.1 Organisation of this Thesis ...4
1.2 The Big Open Problem in Text Entry ...5
Chapter 2 A Review of Accuracy in Text Entry Studies ...10
2.1 The Early Period – The Invention of Touch Typing...10
2.2 The Middle Period – Taxonomies of Errors...12
2.3 Text Entry Research 1980 – 2000...17
2.3.1 Forced Synchronisation ... 17
2.3.2 Manual Error Tabulation ... 21
2.3.3 Word-Level Errors...21
2.3.4 Ad Hoc Treatment of Errors ... 23
2.3.5 Avoiding Error Analysis ... 26
2.4 A Brief Note Regarding Speed...26
2.5 Conclusions...28
Chapter 3 Levenshtein’s Minimum String Distances - Automated Analysis of Text Entry Errors...29
3.1 Introduction...29
3.2 The Minimum String Distance ...30
3.2.1 Mathematical Properties of the MSD ... 31
3.3 Calculating the Minimum String Distance...32
3.4 The Minimum String Distance Error Rate ...36
3.6 Extending the Minimum String Distance ...39
3.6.1 Transposition – Example of a New Editing Primitive ... 39
3.6.2 Primitive Costs...40
3.7 Conclusions...42
Chapter 4 Character-level Error Analyses...43
4.1 Introduction...43
4.2 Character-level Error Rates...45
4.3 Error Rate Formula ...47
4.4 Finding Alignments in the D Matrix ...49
4.5 Empirical Example – Three-key Text Entry...51
4.6 Confusion Matrix...54
4.7 Conclusions...56
Chapter 5 Phrase Set for the Evaluation of Text Entry Technologies...57
5.1 Text Entry Evaluations ...57
5.2 The Phrase Set ...58
5.3 Readability of the Phrase Set ...61
5.4 Punctuation and Other Characters ...62
5.5 Final Remarks ...63
Chapter 6 An Evaluation of MSD and KSPC, and a New Unified Error Metric ...64
6.1 Introduction...64
6.2 Measuring Error Rate ...65
6.2.1 Key Strokes per Character (KSPC)... 66
6.3 Motivating Thoughts: A Combined Error Rate ...69
6.4 Constituents of the Input Stream ...70
6.5 An Example ...72
6.6 New Metrics arising from the Taxonomy ...74
6.7 Total Error Rate...76
6.8 Empirical Data – An Experiment ...78
6.8.1 Participants ...78
6.8.2 Apparatus and Software ... 79
6.8.3 Procedure...80
6.8.4 Design ...81
6.9 Results...81
6.9.1 The Error Rates...82
6.9.2 Text Entry Speeds...82
6.10 Discussion...84
Chapter 7 Further Developments in Text Entry Research...87
7.1 Accuracy in Text Entry ...87
7.2 New Statistics ...89
7.3 Constructive Text Entry Methods ...90
7.4 Concluding Remarks ...94
Chapter 8 Text Entry Behaviour Outside of the Laboratory ...95
8.1 Introduction...96
8.2 Language Modelling in Text Entry Research ...96
8.2.1 Movement-Minimising Input Techniques... 97
8.2.2 Predictive Input Techniques... 98
8.2.3 Hybrid Input Techniques... 99
8.3 Caveats of Language Modelling ...100
8.3.1 Input-based Language Modelling...101
8.3.2 Corpus Text is Not Representative of User Language... 101
8.3.3 Corpus Text Ignores the Editing Process ... 102
8.3.4 Corpus Text Does not Capture Input Modalities ... 104
8.3.5 Currently, There Is No Text Input Corpus... 105
8.4 Empirical Study of a Typical Text Input Stream...105
8.4.1 Materials and Method...105
8.4.1.1 Software...105
8.4.1.2 Participants ...106
8.4.2 Results and Discussion ...107
8.4.2.1 Applications Used ...107
8.4.3 Keystroke Frequency Data ... 108
8.4.4 Unrepresented Keystrokes ...110
8.5 Conclusions...111
Chapter 9 A Review of Information Theory and the Fundamental Theorem of Communication ...113
9.1 A Brief Tutorial on Information Theory ...113
9.2 The Fundamental Theorem for a Channel with Noise ...119
9.2.1 Equivocation...120
9.2.2 Throughput...121
9.2.3 Channel Capacity...122
9.3 The Fundamental Theorem and Throughput ...122
9.3.1 Throughput as a function of Source Entropy and Equivocation... 126
9.4 A Brief Annotated Bibliography of Information Theory...127
Chapter 10 A Review of Information Theory in Psychology ...130
10.1 The Early Psychology Literature: Reaction Time Studies ....130
10.2 Information Theory and Psychology ...133
Chapter 11 An Informatic Rationale for the Speed-Accuracy Trade-Off..139
11.1 The Speed-Accuracy Trade-Off ...139
11.2 Information Theory – A Summary...141
11.3 Empirical Data from the Literature...144
11.4 Hypothesis Concerning Human Performance...146
11.4.1 Claim 1: The Informatic Basis of Action...147
11.4.2 Claim 2: Shannon’s Fundamental Theorem Applies to Human Information Transmission ...147
11.4.3 The Fundamental Theorem of Human Performance ... 148
11.4.4 Claim 3: Freedom of Choice Applies to Performance ... 150
11.4.5 Claim 4: Efficiency is Preferred...152
11.4.6 The Theory of Motivated Performance...155
11.5 Mathematical Implications ...156
11.6 Motivation and the Speed-Accuracy Trade-off...158
11.6.1 The Shape of Eh and the Role of α in the Speed-Accuracy Trade-off ...159
11.7 Concluding Remarks ...159
Chapter 12 Measuring Throughput in a Text Entry Task...161
12.1 Experiment 1 – Fitts 1966 Redux ...162
12.1.1 Participants ...162
12.1.2 Apparatus and Software ... 162
12.1.3 Procedure...163
12.1.4 Design ...164
12.2 Results...166
12.3 Experiment 2 – An Informatic Analysis of Text Entry...169
12.3.1 Participants ...169
12.3.2 Apparatus and Software ... 170
12.3.3 Procedure...170
12.3.4 Design ...171
12.4 Estimating the Entropy of English Text ...171
12.5 Results...175
12.6 Discussion...179
12.6.1 Speed and Error Rate versus Throughput... 180
Chapter 13 Conclusions and Future Work...182
Bibliography...184
Appendix A Phrase Sets for Use as Presented Text in Text Entry Studies...196
List of Tables
Table 1 - Examples of Minimum String Distance and Error Rate... 37
Table 2 - Results of character-based analysis of MSD matrix ... 46
Table 3 - Minimum String Distance versus Character-Level Error Rates... 48
Table 4 - A Character-Level Error Analysis of Empirical Data ... 52
Table 5 - Frequency of Letters in the Phrase Set ...60
Table 6 - The Five Most Frequent Words in the Phrase Set ...61
Table 7 - Readability of the Phrase Set ...62
Table 8 - Comparison of Corrected, Uncorrected, and Total Error Rates ... 69
Table 9 - Comparison of Error Statistics... 73
Table 10 - Results for Disappearing Presented Text condition... 83
Table 11 - Results for the Remaining Presented Text condition... 83
Table 12 - Application Usage by Percent Received Keystrokes ... 107
Table 13 - Frequencies of the Fifteen Most Common Keystrokes... 109
Table 14 - Frequency of Unrepresented Characters... 110
Table 15 - English Language Models Used to Analyse Empirical Data from Experiment 2...175
List of Figures
Figure 1 - Speed and accuracy as distinct performance measures ... 6
Figure 2 - Example of synchronisation in a form-based text input experiment... 20
Figure 3 - The matrix D after initialisation ... 34
Figure 4 - The completed D array...34
Figure 5 - Several examples of the matrix D ... 35
Figure 6 - Completed D array ...44
Figure 7 - Substitution and Deletion Error Rates by Character... 53
Figure 8 - Confusion matrix for Character-Level Error Analysis... 54
Figure 9 - Constituents of the input stream...71
Figure 10 - Classifying the keystrokes in an example...73
Figure 11 - The modified Sharp EL-6053 showing the circuit board and protective cover underneath the keyboard ... 79
Figure 12 - A screen-print of the experimental software... 80
Figure 13 - Information transmission paradigm ... 120
Figure 14 - Depiction of Shannon’s fundamental theorem for a discrete channel with noise ...123
Figure 15 - The relationship between throughput and source entropy ...125
Figure 16 - Depiction of throughput as a function of source entropy and equivocation...127
Figure 17 - Depiction of Shannon’s fundamental theorem for a discrete channel with noise ...143
Figure 18 - Choice Reaction Time Apparatus from Fitts (1966) ... 145
Figure 20 - The Fundamental Theorem of Human Performance... 151
Figure 21 - Effect of a speed or accuracy emphasis on performance ... 154
Figure 22 - Throughput as a function of equivocation...157
Figure 23 - A Laptop was Used for the Fitts (1966) Redux Experiment ... 162
Figure 24 - A screen-print of the Fitts (1966) Redux Experiment software... 163
Figure 25 - Results Data from the Fitts (1966) Redux Experiment ... 167
Figure 26 - A Closer Look at the Results from the First Session of the Fitts (1966) Redux Experiment ...168
Figure 27 - Data from Experiment 1 Plotted as Throughput versus equivocation for comparison with Fitts’ (1966) Data ... 169
Figure 28 - A screen-print of the Informatic Text Entry Experiment software ... 170
Figure 29 - Throughput for All Participants across all three speed-accuracy conditions...177
Figure 30 - Throughput for Participant #1, for all model orders ... 177
Figure 31 - Equivocation versus Source Entropy for All Participants, with the Zero-Order Model ...178
Figure 32 - Equivocation versus Source Entropy for Participant #1, for all model orders ...178
Figure 33 - Error Rate versus Speed ...181
Figure 34 - Data from Experiment 2 Order 0 Model Applied to All Subjects’ Data ...208
Figure 35 - Data from Experiment 2 Order 1 Model Applied to All Subjects’ Data ...209
Figure 36 - Data from Experiment 2 Order 2 Model Applied to All Subjects’ Data ...209
Figure 37 - Data from Experiment 2 Order 3 Model Applied to All Subjects’ Data ...210
Figure 38 - Data from Experiment 2 Order 4 Model Applied to All Subjects’
Data ...210 Figure 39 - Data from Experiment 2 Participant #1 Data, Analysed via All
Model Orders ...211 Figure 40 - Data from Experiment 2 Participant #2 Data, Analysed via All
Model Orders ...211 Figure 41 - Data from Experiment 2 Participant #3 Data, Analysed via All
Model Orders ...212 Figure 42 - Data from Experiment 2 Participant #4 Data, Analysed via All
Model Orders ...212 Figure 43 - Data from Experiment 2 Participant #5 Data, Analysed via All
Model Orders ...213 Figure 44 - Data from Experiment 2 Participant #6 Data, Analysed via All
Model Orders ...213 Figure 45 - Data from Experiment 2 Participant #7 Data, Analysed via All
Model Orders ...214 Figure 46 - Data from Experiment 2 Participant #8 Data, Analysed via All
Model Orders ...214 Figure 47 - Data from Experiment 2 Participant #9 Data, Analysed via All
Model Orders ...215 Figure 48 - Data from Experiment 2 Participant #10 Data, Analysed via All
Model Orders ...215 Figure 49 - Throughput Data from Experiment 2 Participant #1 Data,
Analysed via All Model Orders...216 Figure 50 - Throughput Data from Experiment 2 Participant #2 Data,
Analysed via All Model Orders...216 Figure 51 - Throughput Data from Experiment 2 Participant #3 Data,
Analysed via All Model Orders...217 Figure 52 - Throughput Data from Experiment 2 Participant #4 Data,
Figure 53 - Throughput Data from Experiment 2 Participant #5 Data,
Analysed via All Model Orders...218 Figure 54 - Throughput Data from Experiment 2 Participant #6 Data,
Analysed via All Model Orders...218 Figure 55 - Throughput Data from Experiment 2 Participant #7 Data,
Analysed via All Model Orders...219 Figure 56 - Throughput Data from Experiment 2 Participant #8 Data,
Analysed via All Model Orders...219 Figure 57 - Throughput Data from Experiment 2 Participant #9 Data,
Analysed via All Model Orders...220 Figure 58 - Throughput Data from Experiment 2 Participant #10 Data,
List of Pseudo-Code Listings
Listing 1 - An Implementation of the MSD function ... 33 Listing 2 - The transposition editing primitive code function... 41 Listing 3 - The MSD algorithm enhanced to support the transposition editing
primitive ...41 Listing 4 - Generating Alignments from the MSD D Matrix ... 50
List of Equations 5 EC ChWW TCh NCW = − + , (1) ...24
6
.
1
5
0
23
3
−
+
=
−
=
NCW
words. ...251 60
5
Characters
WPM
Elapsed Time
−
=
×
. (2) ...261
Keystrokes
Keystroke Speed
ElapsedTime
−
=
. (3)...27(
)
(
,
)
100
%
max
,
×
=
B
A
B
A
MSD
ors
PercentErr
, (4) ...36 #of Keystrokes KSPC Transcribed Text = . (5)...383
100%
37.5%
8
MSD Error Rate
=
×
=
(6)...44(
,
)
100%
AMSD A B
Alignment Adjusted MSD
S
=
×
, (7)... 49%
100
)
,
(
max
)
,
(
×
=
T
P
T
P
MSD
Rate
Error
MSD
Old
(8) ...65(
,
)
100
%
×
=
AS
T
P
MSD
Rate
Error
MSD
New
(9) ...66 Input Stream KSPC Transcribed Text = . (10) ...67%
100
×
+
=
INF
C
INF
Rate
Error
MSD
, and, (11) ...72INF
C
F
IF
INF
C
KSPC
+
+
+
+
≈
. (12) ...72F
IF
Efficiency
Correction
=
. (13) ...74INF
IF
IF
ousness
Conscienti
t
Participan
+
=
. (14) ... 75C
Utilised Bandwidth
C INF IF F
=
+
+
+
(15) ...75INF IF F
Wasted Bandwidth
C INF IF F
+
+
=
+
+
+
(16) ...76%
100
×
+
+
+
=
IF
INF
C
IF
INF
Rate
Error
Total
. (17)...76%
100
×
+
+
=
IF
INF
C
INF
Rate
Error
Corrected
Not
, and, (18) ... 76%
100
×
+
+
=
IF
INF
C
IF
Rate
Error
Corrected
, (19)... 76100%
Corrected but Right
IFc
Error Rate
=
C INF IF
+
+
×
, and, (20) ... 90100%
Corrected and Wrong
IFe
Error Rate
=
C INF IF
+
+
×
, (21)... 90( )
log2( )
H p = − p . (22)...114( )
1 6 2 (' ') log 2.585(
)
(
)
2 2('
')
log 0.75
0.415 bits,
('
')
log 0.25
2 bits.
H heads
H tails
= −
=
= −
=
...114{ }
(
)
(
( )
)
( )
1 2 log N i i i i i i i H p p H p p p = = ⋅ = − ⋅∑
∑
(23) ...115( )
( )
( )
2 2 2('
')
log
0.5 log 0.5
0.5 log 0.5
1 bit per toss.
i i
H fair coin toss
= −
p
⋅
p
= −
⋅
−
⋅
=
∑
... 115( )
(
)
(
)
2 2 2('
')
log
0.25 log 0.25
0.75 log 0.75
0.811 bits per toss.
i i
H biased coin toss
= −
p
⋅
p
= −
⋅
−
⋅
=
∑
... 115(
)
(
) (
)
(
)
( )
( )
( )
( )
( )
( )
( )
( )
{ }
( )
( )
{ }
( )
( )
, 2 , 2 2 , 2 2 , , 2 2 2 2log
log
log
log
log
log
log
log
log
i j i j i j i j i j i j i j i j i j i j i i j i i j i j j i i i j j i j j i i i j j i j i jH A B
a b
H a b
a b
a b
a b
a
b
a b
a
a b
b
b
a
a
a
b
b
a
a
b
b
H a
H b
H A
H B
+
=
⋅
⋅
⋅
= −
⋅ ⋅
⋅
⎡
⎤
= −
⋅ ⋅
⎣
+
⎦
= −
⋅ ⋅
−
⋅ ⋅
⎛
⎞
⎛
⎞
= −
⎜
⎟
⋅ ⋅
−
⎜
⎟
⋅ ⋅
⎝
⎠
⎝
⎠
= −
⋅
−
⋅
=
+
=
+
∑
∑
∑
∑
∑
∑ ∑
∑ ∑
∑
∑
(24)... 116('
')
('
')
('
')
1 bit per toss
0.811 bits per toss
1.811 bits per toss.
H two coin tosses
=
H fair coin toss
+
H biased coin toss
=
+
=
(
)
(
)
(
)
(
(
)
)
( )
( )
(
)
(
)
( )
( )
( )
(
)
(
) (
)
( )
( ) (
)
2 2 2 2 2 2 2 2 2 2 2 , log 1 log 1 log log 1 log 1 log log log 1 log 1 1 log log 1 lo A P i A i A j A j i j A i A i i A j A j i A i A A i i i i A j A A j j j j A A A H A B a P a P b P b P a P a P b P b P a P P a a P b P P b b P P P = − ⋅ ⋅ ⋅ − − ⋅ ⋅ − ⋅ = − ⋅ ⋅⎡⎣ + ⎤⎦ ⎡ ⎤ − − ⋅ ⋅⎣ − + ⎦ ⎛ ⎞ = − ⋅⎜ ⎟⋅ − ⋅ ⋅ ⎝ ⎠ ⎛ ⎞ − − ⋅⎜ ⎟⋅ − − − ⋅ ⋅ ⎝ ⎠ = − ⋅ − − ⋅∑
∑
∑
∑
∑
∑
∑
∑
(
)
( ) (
)
( )
(
)
( )
{ }
(
)
( )
{ }
(
)
( )
(
) ( )
2 2 2 g 1 log 1 log , 1 1 , 1 1 A A i i A j j i j A A A i A j A A A A P P a a P b b H P P P H a P H b H P P P H A P H B − − ⋅ ⋅ − − ⋅ ⋅ = − + ⋅ + − ⋅ = − + ⋅ + − ⋅∑
∑
... 117(
,
)
(
, 1
)
( ) (
1
) ( )
A P A A A AH
A B
=
H P
−
P
+
P H A
⋅
+ −
P H B
⋅
The entropy of choosing between A and B. Entropy of A weighted by probability PA Entropy of B weighted by probability (1-PA) (25)... 118( )
('
')
75%
('
')
25%
('
')
0.811
0.75 2.585
0.25 1
3 bits
H
H biased coin toss
H die throw
H fair coin toss
⋅ =
+
×
+
×
=
+
×
+
×
≈
...118 R = I – E, (26) ...121if
0
0
otherwise.
I E
I E
R
= ⎨
⎧
−
− >
⎩
(27)...121( )
(
)
,
C Max R
Max I E
=
=
−
(28) ...122 R≤C. (29) ...122 Emin = I – C.(30) ...122 I = C + S. ...124(
)
.
minE
I C
C S
C
S
= −
=
+
−
=
...124(
)
.
minR I E
C S
S
C
= −
=
+
−
=
(31) ...124 R = I – E. (32) ...142( )
(
)
, C Max R Max I E = = − (33) ...142 R≤C. (34) ...142 Emin = I – C.(35) ...142 R = -0.696×E + 8.563, (36)...146 Eh = α·Emin (37) ...149 ( ). h min E E I Cα
α
= ⋅ = ⋅ − (38) ...1561
,
hR
αE
α
C
α
−
⎛
⎞
=
⋅
⎜
⎟
+
⎝
⎠
(39)...156(
)
(
)
( )
( )
2( )
( )
2( )
1 1 1|
,
,
log
,
log
N N N i j i j j j i j jE H S R
H S R
H R
p s r
p s r
p r
p r
= = ==
=
−
⎡
⎤
⎡
⎤
=
∑∑
⋅
⎣
⎦
−
∑
⋅
⎣
⎦
(40)... 166(
|)
(
( )
,)
(
(
,)
)
, i n i n i n n i n i p c b p c b p c b p b p c b ∀ = =∑
(41)...172(
i)
H
c
Presented Text
I
time
∀ ∈
=
∑
. (42)...176(
i|
i)
H
c
Transcribed Text c is Correct
R
time
∀ ∈
=
∑
. (43) ... 176Chapter 1
Introduction
In 1992 three researchers, Matias, MacKenzie, and Buxton (1993, 1994, 1996), planned to evaluate a new text entry technique.1 The idea was to support single-handed text entry using only half of a standard Qwerty keyboard. The group had a prototype and they hired a programmer to write software for the experiment they were planning. The software had to measure both speed (characters typed per second), and accuracy (number of errors) as the participants in their study entered text. Speed was easy to measure but counting the errors proved to be difficult, prompting the programmer to ask, “what constitutes an error”? A simple character-by-character comparison of the presented text (what the participants are asked to enter) and the transcribed text (what participants actually enter) would be easy to implement, but would it be correct? For example, suppose a participant was asked to enter the phrase “the quick brown fox”.
Presented Text: the quick brown fox
Transcribed Text: the quixck brwn fox ^^^^^^
A character-by-character comparison indicates that the participant made six errors (as indicated), although it seems more likely that the participant really made only two errors – an extra “x” was typed, and an “o” was omitted. Although character-by-character comparisons are easy to implement in software a more representative way to count errors was needed.
An algorithm could be designed that allowed for single character insertions and omissions. But then suppose the participant inserted two or three extra characters, or omitted two or three characters, or simply typed some altogether wrong characters. Could an algorithm be designed that can reliably detect and count multiple errors, considering all combinations and permutations of errors that could be made? What began as a simple character-by-character comparison had grown in complexity.
In the end no satisfactory error tabulating algorithm could be found, so Matias, MacKenzie, and Buxton chose another option – they changed the methodology of their experiment. Participants were required to keep synchronised with the presented text (forced synchronisation). Whenever the participant entered a character that didn’t match the expected character an error beep sounded, and the participant had to resynchronise themselves to the presented text, continuing by typing the next correct character. This methodology bares a similarity to the by-character comparison except that instead of performing the character-wise comparison a posteriori, the participants are given the opportunity to fix the alignment of the characters as they entered the text. This methodology has the great benefit that errors become easy to count – because of the forced synchronicity, a character-wise comparison can be used. The problem with this approach is that the artificiality of the text entry process reduces the generalisability of the results – after all in the real world no interface behaves in this way, interrupting the typist after every error – the methodology reduces the realism of the text entry task. In terms of the empirical results, the forced synchronisation had two obvious negative effects:
1. Interrupting participants so they could realign themselves after every error had a negative impact on the participants’ speed, and,
2. Forced synchronicity also increased the error rate. Participants tended to make error “chunks”, beginning with a legitimate error, and continuing for a few (possibly correct) characters until resynchronisation was accomplished.
This second point deserves some elaboration. Due to error chunking the error rates observed with the forced synchronicity methodology provide only a coarse perspective of the errors. For example, some correct but out-of-sync characters would be counted as errors and these would be indistinguishable from multiple errors occurring together. But worse, information pertaining to the participants’ natural error correction strategies is squandered because error correction is not possible under this restrictive methodology. The input process is really the editing process. By forbidding participants to correct their mistakes researchers are missing an extremely important part of the text entry process.2 So the text entry speed and error rates observed during this forced synchronisation cannot be said to generalise to normal unconstrained text entry. However, it is precisely this generalisation of experimental observations that is desired in empirical studies such as this.
2 Card et al. (1980) report that up to one fourth of an expert’s time can be spent correcting errors. In
Chapter 8 we will describe a study in which we have found that the backspace key is the second most common keystroke (following the space bar, but more common than the letter “e”) in typical desktop computer keyboard text entry.
1.1
Organisation of this Thesis
A large portion of this thesis concerns finding and then evaluating a solution to the problem of measuring error rates in text entry studies.3 The development of the methodology described here was not instantaneous. Rather it evolved and was refined over time. The presentation of the text entry error rate methodology in this thesis mirrors chronologically the development of the methodology. A review of the literature is presented in Chapter 2. Solutions to the problem of measuring errors in text entry studies are presented in Chapters 3 to 7, tracing the series of publications through which the methodology was developed. These chapters are primarily focused on the evaluation of key-based text entry methods using predefined standardised text phrases.
Chapter 8 presents a study that explores a different approach to observing text entry behaviour. In the preceding chapters text entry performance is measured by observing participants performing “text copy” tasks under experimentally controlled conditions. The weakness inherent in this approach is that it does not generalise well to typical real-life text entry behaviour. Outside of the lab, most text entry occurs as a part of text generation instead of copying tasks, and the generation of text is a difference process to which the analytical approach described in Chapters 2 through 7 cannot be applied. The work in these early chapters does not take into account issues related the text creation process which are relevant but beyond the scope of the thesis. Chapter 8 presents a preliminary study of real-life text entry that, although not conclusive, is as interesting for the methodology used as the results garnered.
3 Because the primary focus of this dissertation is the measurement of error rate, the reader may be
tempted to presume falsely that our position is that the typical user’s goal is the production of error-free text. This is not so. Imperfect text is excusable in many casual circumstances, and even desirable by the artistic and/or rebellious. And, due to the inherent redundancy of Human “natural languages”, errors do not necessarily inhibit the comprehensibility of inaccurate text. Our objective is the development of a means to determine the error rate, as is this an important factor in the evaluation of text entry technologies.
The early chapters of this thesis reflect significant effort and progress made toward building a sound experimental methodology for text entry research. However, as with all large research problems, every answer brings with it new questions. And so it is with our work here, that a new problem has become apparent, which we have termed the “Big Open Problem in Text Entry” (described in a moment). Chapters 9 and 10 present reviews of the literature pertaining to this problem. Chapter 11 constitutes a significant contribution to the literature by providing a rationale for the speed-accuracy trade-off of human performance. And finally, Chapter 12 presents empirical data that validates the rationale of the speed-accuracy trade-off.
1.2
The Big Open Problem in Text Entry
There is a large open problem that inhibits the ability of researchers to perform human performance comparisons, even with a robust treatment of errors. (Presume for the moment, that we have a rigorous definition of error rate, that can be used in a text entry study.) Consider the comparison of the performance of the following three fictitious people: Medium Marvin types English text at a rate of 60 words per minute (wpm) with a 6% error rate, Careful Clarence also types at 60 wpm, but with only 3% errors, and Speedy Eddie types at 75 wpm with a 6% error rate (see Figure 1).
Speedy Eddie is faster than Medium Marvin, and with an identical error rate, so clearly, Eddie outperforms Marvin. Careful Clarence matches Marvin’s speed but with fewer errors, thus Clarence also outperforms Marvin. But what can be said regarding Eddie versus Clarence? Because Eddie and Clarence differ in both their speeds and accuracies, it is not obvious which of the two has superior over all performance. The Eddie versus Clarence performance question reveals two deficiencies in the analytic techniques currently available. The lack of an analytic
model describing the speed-accuracy trade-off 4, means that even simple performance comparisons are out of our reach. And, although we possess an intuitive understanding of what the term “over all performance” means, there is neither a specific definition of this term, nor a suitable metric that incorporates both the speed and accuracy of performance into a single measurable and comparable statistic. Furthermore, because empirical performance evaluation is such a widely used analytical tool, the lack of a solution to the performance comparison problem affects many areas of human interface design, well beyond the realm of text entry.
Medium Marvin (60 wpm, 6% errors)
Speedy Eddie (75 wpm, 6% errors)
Accuracy (defined as: 100 – ‘% errors’)
94% accuracy
(6% errors) 97% accuracy (3% errors)
Speed ( w ord s per minut e) 75 w p m 60 w p m Careful Clarence (60 wpm, 3% errors)
Figure 1 - Speed and accuracy as distinct performance measures
4 The essence of the Speed-Accuracy Trade-Off is that there is an inverse relationship between speed
and accuracy (as speed increases, accuracy decreases, and vice versa), and that people have some control over the place in the speed-accuracy continuum at which they perform.
So, how can this problem be solved? One could, for example, ignore errors, and then incorrectly conclude that Marvin and Clarence performed identically to one another, although this is clearly not so. Alternatively, one could presume that speed statistics can be meaningfully compared so long as the corresponding error rates are similar to one another. But, in addition to not being objective (because the researcher’s subjective definition of the word “similar” influences the performance comparison) this approach does not allow comparison when the error rates are relatively far apart, such as in the preceding example (Marvin has twice the error rate of Clarence).
A contemporary “state of the art” approach to the Eddie versus Clarence performance question would require the execution of another experiment to obtain new “comparable” data. One of either speed or accuracy could be controlled,5 while the other is compared as a dependent performance measure. For example, Eddie and Clarence could be made to perform with identical speeds,6 so their accuracies could be compared, mitigating the effects of the speed-accuracy trade-off. There are two problems with this approach:
1 Coercing Eddie and Clarence to perform at a comparable speed or accuracy level diminishes both the external and internal validity of any conclusions drawn; because, requiring participants to work at a speed or accuracy level other than that at which they would naturally perform, changes the nature of the task. Ideally participants would be observed in as representative an environment as possible, including allowing the participants to occupy their preferred place along the speed-accuracy continuum.
5 A controlled variable refers to a factor that might otherwise affect the outcome of an experiment, that
is deliberately made to be identical across all conditions, so as to ensure that it has the same effect upon all conditions.
6 There are protocols available to experimenters to influence the performance of subjects toward either
speed or accuracy. For example, subjects can be paid based on one of either speed or accuracy, a pace signal such as an auditory or visible metronome can be used, or, subjects can simply be instructed to favour either speed or accuracy.
2 Artificially influencing speed or accuracy obscures interesting and potentially important aspects of the participants’ performance. Allowing participants the freedom to choose their own place in the speed-accuracy continuum makes it possible to simultaneously observe separate effects on speed and accuracy that would otherwise be impossible if speed or accuracy are manipulated.
In the latter portion of this thesis (beginning with Chapter 9) the application of information theory to the problem of human performance measurement is explored. Information theory contains an useful property – in that it allows one to combine the concepts of information transmission speed and error rate, into a single quantity, throughput, that is an intuitive and useful summary of transmission performance. Throughput represents the number of usable bits of transmitted information that a communication channel is capable of imparting, accommodating both speed and accuracy, making the comparison of the performance of different communications channels possible. A statistic with this property does not, at the moment, exist in text entry research; consequently, determining which of two touch-typists is “better”, considering both their speeds and accuracies, is currently not possible.
The idea of applying information theory to the measurement of human performance is not new. The emergence of information theory (Shannon 1948) had a profound impact upon the field of psychology in the 1950s and 1960s, that in hindsight some described as fad-like (see Luce 2003). The result was a flurry of experiments and papers by psychologists attempting to determine and interpret the informatic properties of the human cognitive, nervous and neuromuscular systems. There were some successes, for example, the Hick-Hyman law (Hick 1952, Hyman 1953), and Fitts’ law (Fitts 1954, MacKenzie 1992), but in general the application of information theory to human performance was disappointing to the researchers of that time. As a result, the prevailing opinion of contemporary psychologists is that human behaviour and performance are too complicated to be explained by
information theory. And yet, as the following quotation from the noted psychologist Welford reveals, others have also considered using throughput as a performance measure:
“In so far as the information measure [throughput] is adequate, it provides a valuable means of combining speed and accuracy into a single score, and emphasises the important fact that times for different tasks are comparable only if errors are held constant, and conversely that error rates can be compared only if times are held constant.”
- Welford (1968, page 67)
We have pursued the idea of using throughput as a human performance metric in text entry studies, and have uncovered an explanation of the speed-accuracy trade-off that arises due to the application of information theory to human performance (presented in Chapter 11). This has provided a model of the speed-accuracy trade-off, and has demonstrated that throughput may be used for text-entry performance comparison (and perhaps for the performance comparison of tasks other than text entry as well).
Chapter 2
A Review of Accuracy in Text Entry Studies
This chapter reviews the history of the measurement of accuracy in industrial and academic text entry studies. Chronologically, this review covers the period from the late 1800s, beginning with the creation of the mechanical typewriter and the emergence of touch typing, to the state of the art as it was in 2001. (In 2001 a novel approach to error rate analysis in text entry studies appeared, which is the focus of Chapter 3.)
Several thorough reviews already exist of the typing literature (Cooper 1983b, Noyes 1983, Potosnak 1988, Yamada 1980) and the text entry literature (MacKenzie & Soukoreff 2002a, Silfverberg 2007, Kroemer 2001). These general purpose reviews describe the inventions and improvements made to text entry technologies over the years. The review presented here is focused upon the metrics used to evaluate these text entry technologies, and how the metrics evolved over the years. Because text entry speed is relatively easy to measure, the issue of interest will be how researchers measured the accuracy of text entry in their studies; specifically, what definition of the term “error” was used, and by what means were errors tabulated (manual, mechanical, or automated)?
2.1
The Early Period – The Invention of Touch Typing
By 1450, Gutenberg’s press was in operation, marking the beginning of moveable-type printing in the West, and in the years to follow technological advancement brought several improvements to text-setting devices. The first primitive typewriting machine was patented by Henry Mill in 1714, the first machine to use individual buttons for each letter was invented by Xavier Progin in 1833, and the
first machine to include an automatic escapement mechanism that advanced the carriage after each keypress was invented by John Pratt in 1843. Progress continued, and in 1874 the E. Remington and Sons Arms Company released the Sholes & Glidden Type Writer, the first commercially available device that was essentially equivalent to a modern typewriter, employing the familiar Qwerty keyboard layout (the design of which is attributed to Christopher Sholes) with a mechanism efficient enough to allow touch-typing. Doubtless, the first evaluation of this typewriter was that of Samuel Clemens (a.k.a., Mark Twain), who in 1874 wrote that it “will print faster than I can write” and “it don’t muss things [...] it saves paper” suggesting that the typewriter was both faster and more accurate than his handwriting. (Lundmark 2002, pp. 10-13)
Following the general acceptance of typewriters into businesses and government office environments, interest shifted toward improving typing speeds. Touch-typing evolved in the 1880s, and it’s widespread acceptance may have been aided by the publicity of annual typing speed contests. By approximately 1915, world champion typing speeds had reached an asymptote averaging 140 ±10 words per minute (although occasionally higher speeds have been reported). (Yamada 1980, p. 182)
Among the earliest entries in the academic literature pertaining to typing was the work of William Book (1908), who published the results of a year–long longitudinal study of typing skill acquisition. Eleven participants were observed as they practiced typing and performed tests, and their key presses were encoded onto a kymograph (a paper-covered rotating drum marked with a mechanical stylus). Typing speeds were calculated from the markings on the kymograph. The analysis of errors is not explicitly mentioned in this book, although it is implied that the participants plodded along at paces that were slow enough that errors were not excessive. (Book 1908)
Although several alternate keyboard arrangements had been proposed in the early 1900s using solely the statistical properties of English (see Yamada 1980, pp. 184-185), Hoke (1921) is credited with being the first to consider human factors when designing a keyboard arrangement. Hoke analysed the errors in 497 pages of practice typewriting work generated by approximately 100 individuals (presumably typing students). No specific definition of an error is provided. Errors were tabulated by hand.
Lessenberry (1928) compiled a collection of 60,000 typing errors from students and manually analysed the errors, reporting that substitution was the most frequent error. The substitution errors were further analysed in terms of the relative physical positions, on the Sholes keyboard, of the intended and erroneous keys.
The Dvorak Simplified Keyboard was created and patented by Dvorak and associates in the 1930s (Dvorak & Dealey 1934, Dvorak & Dealey 1936), followed by an extensive study comparing the Sholes (Qwerty) and Dvorak keyboard arrangements (Dvorak Merrick Dealey & Ford, 1936). In the study three classes of error were described: substitution errors (wrong letter typed), omission errors (missing letter), insertion errors (extra letter), and transposition errors (two otherwise correct letters in reverse order). Errors were tabulated by hand. The most frequently occurring errors were substitution errors (Dvorak Merrick Dealey & Ford 1936, pp. 366).
2.2
The Middle Period – Taxonomies of Errors
From the 1950s to the 1980s, considerable work was done by psychologists investigating the mechanisms by which people touch type. The research of this period emphasises the observation and analysis of physical properties associated with typing, such as inter-keystroke times, eye-scanning times, and the number of characters that typists read ahead. Sometimes electronic or photographic means were used to record motion as participants transcribed text, and invariably error
analyses were performed manually. Errors were classified according to complex non-standardised taxonomies that intermingled the types of errors with the potential causes of errors, and these error classes were not always distinct. For example, MacNeilage (1964) delineated errors into four broad categories, each containing several subcategories:
1. Spatial errors consisting of horizontal, vertical, and diagonal subcategories describing the errant finger motion that may have caused the error;
2. Temporal errors consisting of reversal (otherwise correct but reversed in order), omission, and equivocal (when in the process of committing a transposition error, the participant realises their mistake and stops typing), and anticipation (when a character appears more than one keystroke ahead of where it should) errors;
3. Miscellaneous errors consisting of interpolation (an extra character with no relationship to the correct characters), phonemic (substitution with a character with a similar sound to the intended character), type (when a different but valid English word is formed as a result of the error), contralateral (when a substitution error involves the wrong hand and so is the mirror image of the intended character), and dynamics (an error in character repetition when the sequentially neighbouring character is repeated, i.e., “eroors” instead of “errors”) errors; and,
4. Other errors include multiple classification errors (when a single error fits the criteria of more than one of the other categories), and unclassifiable errors (that fit none of the other categories).
In attempting to compare several alternatives for data entry Devoe (1967) observed that “error scoring was highly subjective, and the results can be considered only indicative, rather than conclusive” (Devoe 1967, p. 25).
Shaffer & Hardwick (1968) presented typists with text and instructed their participants to type as quickly as possible and to leave their errors uncorrected. The authors state that “any textual discrepancy between original and transcript was counted as an error and, for the purpose of scoring, the unit of error was a symbol letter, space or punctuation mark” (Shaffer & Hardwick 1968, p. 363). They also stated that “any attempt to classify errors by their causes is hazardous because most errors are cryptic and their possible causes ambiguous, sometimes in several ways” (Ibid.). Errors were classified according to the following taxonomy:
• Omission (missing character),
• Response (substitution errors where the erroneous character was one of the neighbouring keys of the keyboard),
• Reading (substitution errors where the intended and erroneous characters are visually or acoustically similar),
• Context (is a generalisation of transposition errors, where the erroneous character is within three characters of its correct position, but also includes some cases in which the error formed a different valid English word), and,
• Random (which is undefined but presumably includes all other errors).
In a separate study Shaffer & Hardwick (1969) analysed word-level errors. A word was labelled incorrect if any number of errors occurred within it, and these word-level errors were classified depending upon whether the resultant “word” was a real
word (an unintended but valid English word), or random (no definition provided) or nonsense (no definition provided).
In Rabbitt’s (1978) investigation into the awareness that touch typists have of the errors they make as they type, errors were classified into compound errors (when there are multiple errors), omissions, and mistypes (including both substitutions and insertions).
In 1983 an edited book, Cognitive Aspects of Skilled Typewriting, was published (Cooper 1983a). This text is significant because it presents reviews of several aspects of the literature including two chapters on typing errors, and because it represents an accurate description of “the state of the art” in typing research at this time. Three chapters are of particular interest and are reviewed next.
Gentner and colleagues (Gentner et al. 1983) provide a detailed taxonomy of touch typing errors that includes characterisation of the reasons that specific typing errors occur (e.g., wrong finger, finger hits multiple keys, wrong hand, etc.), as well as characterisation of the errors that resulted:
• Insertion (extra character),
• Transposition (reversed order of otherwise correct characters), • Migration (correct character but in the wrong location),
• Interchange (when two non-adjacent characters have been swapped), • Omission (missing character),
• Doubling (accidentally repeated character), and,
• Alternation (where alternating characters are reversed, for example “thses” versus “these”).
Norman & Rumelhart (1983), report on an ad hoc analysis of typing errors occurring in the e-mails of their research group. They asked their associates to volunteer any typing errors that they noticed in their own e-mails (either sent or received by them). These errors were subsequently manually analysed. The fact that the analysis was performed by hand is significant in light of the fact that the source of the text was purely electronic. Although particularly interested in transposition errors, doubling errors, and alternation errors, the authors acknowledge the large taxonomy provided by Gentner and colleagues (Gentner et al. 1983). In previous work, Rumelhart & Norman (1982) had manually tabulated two additional classes of errors: homologous errors (a substitution error wherein the corresponding finger from the wrong hand makes the correct motion), and capture errors (one or a series of substitution errors that result in changing the suffix of the intended word, so that the word becomes another valid English word).
Grudin (1983), in his study of expert and novice typing errors, manually analysed errors, using a five class taxonomy. Four of the classes consisted of single instances of errors: insertion, omission, substitution, and transposition. The fifth category, other errors, consisted of all cases where more than one error occurred within a word. Grudin notes that the majority of errors in the “other” category are multiple substitution errors. Most of his subsequent analysis is focused on the single substitution errors. The fact that cases of multiple errors were placed in the “other” category rather than being counted as two substitution errors, suggests that the complexity of multiple errors had been a deterrent to researchers already labouring under the heavy demands imposed by manual tabulation.
2.3
Text Entry Research 1980 – 2000
There was a great surge in interest and research into text entry in the 1990s, initiated by the advent of tablet computers with stylus-based interfaces, and fuelled by the arrival of mobile computing. The new stylus-based interface paradigm brought with it new interaction design opportunities, and new problems, and consequently many novel text input technologies were created (MacKenzie 2002a). Literature from this period reveals that errors were still being painstakingly tabulated by hand, and that there were no standard error metrics, making the comparison of different technologies difficult or impossible. Five approaches to the analysis of errors in text entry experiments were employed in this period:
1. Forced synchronisation,
2. Manual identification and tabulation of errors,
3. Utilisation of easier to calculate statistics (such as word-level errors),
4. The ad hoc categorization of errors, or,
5. Ignoring errors, disallowing errors, or discarding trials with errors.
Each of these approaches is reviewed below.
2.3.1 Forced Synchronisation
Some researchers have required participants to keep synchronised with the presented text7 as they type; typically an audible error beep provides instantaneous feedback to the participant as they perform the text entry task. When a character is entered that does not match the expected next character, the beep is sounded,
7 Recall that in the text entry experiment paradigm, the investigator presents subjects with text to be
instructing the participant to resynchronise with the expected text before they continue. The participant may be required to enter the correct character before continuing (see Matias, MacKenzie & Buxton 1996; Venolia & Neiberg 1994; Isokoski & Kaki 2002; Ingmarsson, Dinka & Zhai 2004.) Alternatively, the participant may be allowed to continue without entering the correct character (see MacKenzie & Zhang 1999). In either case, however, it is not possible for the participant to correct their errors.
This methodology greatly simplifies error tabulation. Because of the forced synchronicity, error tabulation is reduced to a character-wise comparison of the presented and transcribed texts, which can be performed in software. However, there are three problems with this methodology:
1. Interrupting participants so they can realign themselves after every error has poor external validity and a negative impact on performance. There is a certain momentum or rhythm that one experiences when entering text. This is an important element of rapid text entry that is disrupted by being forced to stop and realign oneself with the expected text. Logan (1983) reports that skilled touch typists overlap the finger movements of consecutive keystrokes, and concludes that the planning and execution of keystrokes occurs (at least partially) in parallel.
2. Forced synchronicity also increases the error rate. Because a strict character-by-character comparison is used to detect errors, correct characters that follow an error are likely to be counted as errors too. This leads participants to make error “chunks”, beginning with a legitimate error and continuing for several keystrokes until resynchronisation is accomplished. Error chunks were reported by Matias and colleagues (Matias, MacKenzie & Buxton 1996).
3. Forced synchronisation prevents the researcher from obtaining information about correction strategies.
The first two negative effects listed above reduce the realism of the text entry task. Text entry and error rates observed during forced synchronisation do not generalise to normal unconstrained text entry. However, it is precisely this generalisation of experimental observations that is desired. The third negative effect suggests an even more egregious oversight. The text entry process is really the editing process, involving much more than the perfect linear input of alphanumeric symbols. By forbidding participants to correct their mistakes, researchers are missing an extremely important part of the text entry process. Card, Moran & Newell (1980) report that up to one fourth of an expert’s time can be spent correcting errors. A study by Soukoreff & MacKenzie (2003a) (reported in Chapter 8) found that 31% of keystrokes generated by typical computer users were editing functions such as backspace and cursor movements.
Text entry forms were a component of the interface of some (particularly early) tablet computers, where the primary technology for text entry was the stylus instead of the keyboard. These are essentially an electronic version of the common “fill-in the blank” form. Fields on a form (paper or electronic) indicate the specific locations that each character must be written, and so the input text is not simply a stream of characters, but also includes position information. In typical use (i.e., not part of an experiment) one is not obliged to fill-out the fields of a form (or even the characters in a field) in a specific order, and the correction of errors is normally possible (an error can be scribbled-out, and the correct character drawn in).
Several studies of stylus-based text input have used forms, where input fields appeared beneath each character of presented text (see Figure 2). (McQueen et al. 1994, McQueen et al. 1995, Chang 1994, MacKenzie & Chang 1999, MacKenzie et al. 1994a, MacKenzie et al. 1994b)
Figure 2 - Example of synchronisation in a form-based text input experiment This figure is taken from MacKenzie & Chang 1999, Figure 1.
These text entry forms were a normal part of the interface, and the means by which regular text entry was supported on some stylus-based computers. While useful for studying stylus-based text entry, text entry forms do represent a (less rigid) form of synchronisation. The physical layout of the presented text and the input field make it relatively easy for participants to keep themselves synchronised. Error rate calculation is simplified because a character-by-character comparison can be used. As before, data regarding error correction strategies would only be observable if the experiment software supported error correction – and we have not seen any examples of this in the literature.
A variation of the text entry form technique was used by MacKenzie & Zhang (1997) in their investigation of the immediate usability of the Graffiti stylus-based text entry technology. The purpose was to judge participants’ familiarity with the Graffiti alphabet, and so participants were asked to enter the alphabet five times consecutively (one character at a time, no error correction). Because the participants were familiar with the alphabet and required to ignore errors, they naturally remained synchronised to the expected text. Errors were identified via a character-by-character comparison of the transcribed text. However, this is a highly specialised case, and this technique does not generalise to typical text entry.