User Interface Design
Winter term 2005/2006
Thursdays, 14-16 c.t., Raum 228
Prof. Dr. Antonio Krüger
Testing & modeling users
The aims
· Describe how to do user testing.
· Discuss the differences between user testing, usability testing and research experiments.
· Discuss the role of user testing in usability testing.
· Discuss how to design simple experiments.
· Describe GOMS, the keystroke level model, Fitts’ law and discuss when these techniques are useful.
· Describe how to do a keystroke level analysis.
Experiments, user testing &
usability testing
Experiments test hypotheses to discover new
knowledge by investigating the relationship between two or more things – i.e., variables.
User testing is applied experimentation in which
developers check that the system being developed is usable by the intended user population for their tasks.
Usability testing uses a combination of techniques,
including user testing & user satisfaction questionnaires.
User testing is not research
User testing
Aim: improve products
Few participants
Results inform design
Not perfectly replicable
Controlled conditions
Procedure planned
Results reported to developers
Research experiments
Aim: discover knowledge
Many participants
Results validated statistically
Replicable
Strongly controlled conditions
Experimental design
Scientific paper reports results to community
User testing
Goals & questions focus on how well users perform tasks with the product
Comparison of products or prototypes common
Major part of usability testing
Focus is on time to complete task & number & type of errors
Informed by video & interaction logging
User satisfaction questionnaires provide data about users’ opinions
Testing conditions
Usability lab or other controlled space
Major emphasis on
• selecting representative users
• developing representative tasks
5-10 users typically selected
Tasks usually last no more than 30 minutes
The test conditions should be the same for every participant
Informed consent form explains ethical issues
Type of data (Wilson & Wixon, ‘97)
· Time to complete a task
· Time to complete a task after a specified time away from the product
· Number and type of errors per task
· Number of errors per unit of time
· Number of navigations to online help or manuals
· Number of users making a particular error
· Number of users completing task successfully
Usability engineering orientation
· Current level of performance
· Minimum acceptable level of performance
· Target level of performance
How many participants is enough for user testing?
The number is largely a practical issue
Depends on:
• schedule for testing
• availability of participants
• cost of running tests
Typical 5-10 participants
Some experts argue that testing should
continue until no new insights are gained
Experiments
Predict the relationship between two or more variables
Independent variable is manipulated by the researcher
Dependent variable depends on the independent variable
Typical experimental designs have one or
two independent variable
Experimental designs
Different participants - single group of participants is allocated randomly to the experimental conditions
Same participants - all participants appear in every condition
Matched participants - participants are matched in tuples, e.g., based on
expertise, gender
Example
Hypotheses: “Will the time to read a screen of text be different if 12-point Helvetica is used instead of 12-point Times-Roman?”
Condition 1: users read text with Helvetica
Condition 2: users read text with Times Roman
Control condition: read text on paper
Extend design with variable user-expertise (additional conditions: expert/beginner)
What are the independent and dependent
variables
Advantages and disadvantes
Evaluation of results / significance
The larger the sample, the less likely that the difference is due to sampling errors or chance.
The larger the difference between the two means, the less likely the difference is
due to sampling errors
The smaller variance among the
participants, the less likely that the
Variablitiy
Are the results statistically
significant?
Use the t-test to
analyze the ration of means and group
variability
T-test
Use standard-table of significance to determine if t is good enough.
Predictive models
Provide a way of evaluating products or designs without directly involving users
Psychological models of users are used to test designs
Less expensive than user testing
Usefulness limited to systems with predictable tasks - e.g., telephone answering systems, mobiles, etc.
Based on expert behavior
GOMS (Card et al., 1983)
Goals - the state the user wants to achieve e.g., find a website
Operators - the cognitive processes & physical actions performed to attain those goals, e.g., decide which
search engine to use
Methods - the procedures for accomplishing the goals, e.g., drag mouse over field, type in keywords, press the go button
Selection rules - determine which method to select when there is more than one available
Keystroke level model
GOMS has also been developed further into a quantitative model - the keystroke level model.
This model allows predictions to be made
about how long it takes an expert user to
perform a task.
Response times for keystroke
level operators
Problems of GOMS/Keystroke model
Doesn’t take into account slack times and critical situations that may slow down certain strokes.
Example: Usage of system while talking to a person in parallel.
Further influences that are not taken into account:
fatigue, learning effects, workload, etc..
Models are just good to provide an estimate, they can’t substitute user testing
Fitts’ Law (Paul Fitts 1954)
The law predicts that the time to point at an object using a device is a function of the distance from the target
object & the object’s size.
The further away & the smaller the object, the longer the time to locate it and point.
Useful for evaluating systems for which the time to locate an object is important such
as handheld devices like mobile phones
Why are labeled toolbars easier to access?
Key points
· User testing is a central part of usability testing
· Testing is done in controlled conditions
· User testing is an adapted form of experimentation
· Experiments aim to test hypotheses by manipulating certain variables while keeping others constant
· The experimenter controls the independent variable(s) but not the dependent variable(s)
· There are three types of experimental design: different-participants, same- participants, & matched participants
· GOMS, Keystroke level model, & Fitts’ Law predict expert, error-free performance
· Predictive models are used to evaluate systems with predictable tasks such as telephones
Design & evaluation
in the real world
The aims
Show how design & evaluation are brought together in the development of interactive products.
Show how different combinations of design & evaluation methods are used in practice.
Describe the various design trade-offs & decisions that have to be made in the real world.
Key issues: From requirements to design
which design cycle to use
which combination of methods to use when designing & evaluating a product
what happens when the product being developed is confidential and there are no users available to test it?
how many users should be involved in tests?
what to do with the evaluation findings
how much to expect from users
Case study: designing mobile communicators
Two examples, for very different audiences:
Nokia’s mobile communicator
Philips communicator for children
Designing Nokia’s mobile communicator
design cycle: iterative user-centered approach
which methods:
• ethnographic research
• scenarios and task models
confidential product issues:
• first in the market is key
• evaluation must be very limited and no real
Designing Nokia’s mobile communicator (contd)
physical aspects:
• screen size
• number of buttons versus functionality
consistency issues
• internal consistency (within mobile software)
• external consistency (with desktop software)
user testing
• none before release
• summative testing & questionnaires after
Designing telephones for
special user groups
(Royal Nation Institute for the Blind)
Guarded or recessed keys
Sidetone reduction to reduce noise level
Adjustable key pressure
Audio and tactile feedback
Larger key size
Consistency of the Design
Internal consistency
• Nokia style guide
External consistency difficulties
• No pointing device
• Slow connection and download times
• Default homepage
• Transcoding webpages (focus on text)
Philips Communicator for children
(Oosterhoolt ’96)
Designing Philips’ communicator for children
design cycle: iterative and evolutionary
which methods:
• low-fidelity prototyping
• participatory design
• interface metaphors
physical aspects:
• color, shape, size, robustness
• pen input
• bags to protect screen
Designing Philips’ communicator for children
user involvement:
• children involved throughout
• prototypes evaluated constantly
• invaluable insights for the designers
lessons learned:
• agree on assumptions in requirements
• think of follow-on projects early on
• users are not designers
Case study 2: A telephone response information system (TRIS)
Interactive voice response systems are common in
government offices and large companies. Do you know of examples that you have used?
Why are these systems often so frustrating to use?
• Forming a mental model is difficult because there is no visual feedback and the user must remember the menu structure
Many menus and deep menus are particularly difficult
Why was TRIS difficult to use?
Having to remember the menu structure.
The programmers traded computational elegance for usability, e.g., the system asked for social security
number and employee identification number, confusing users who did not have both.
TRIS was comprised of different systems each with its own interaction style. Users were not told this but when they moved between the systems they experienced
sudden, unexplained changes.
How was TRIS evaluated?
A combination of techniques were used:
• a review of the literature provided information about problems with interactive voice response systems
• expert reviews
• GOMS analysis of the proposed redesign
The redesign was implemented
• usability tests confirmed that the redesigned system offered better usability than the original design
Why was using different methods valuable?
The evaluators were able to build-up a broad picture of usability problems.
Using GOMS and heuristic evaluation they could
explore the potential benefits of the redesigned system.
User testing enabled them to confirm that the redesigned system offered better usability.
User satisfaction questionnaires confirmed that users preferred the redesigned system.
Key points
Design involves trade-offs
Design space for making changes when upgrading a product is limited
Cycles of rapid prototyping and evaluation allow designers to examine alternatives
Piecing together evidence from a variety of sources can be valuable