Is this the best way to accomplish a given task?

(1)

1

COMS W4172

Evaluation

Steven Feiner

Department of Computer Science

Columbia University

New York, NY 10027

www.cs.columbia.edu/graphics/courses/csw4172

April 8, 2021

Why Evaluate?



Is this the best way to accomplish a given

task?



What are the task attributes (in particular,

known problem areas) that make the most

difference?

(2)

3

Evaluation Methods

Cognitive Walkthrough

Polson et al. 92



Analogy to code walkthrough



Inputs

 Description of the UI (design sketch; running system not needed)

 Task scenario

 Assumptions about knowledge a user brings to the task

 Specific actions a user must perform to accomplish the task with UI



Evaluator(s) examine each step in the correct action

sequence, asking

 Will user try to achieve the right effect?

 Will user notice that the correct action is available?

 Will user associate correct action with effect user is trying to achieve?

 Will user see that progress is being made toward task solution if correct action is performed?

Evaluation Methods

Heuristic Evaluation

Nielsen & Molich 92



Expert evaluators evaluate prototype

individually by comparing it with heuristic

guidelines

(3)

5

Evaluation Methods

Formative Evaluation



Performed during system evolution



Repeat until satisfied {



Representative users try system in

task-based scenarios



Informal

↔

formal



Identify problems



Modify system to address problems

}

Evaluation Methods

Summative Evaluation



Performed after system complete



Representative users try multiple designs

in task-based scenarios



Informal

↔

formal

(4)

7

Evaluation Methods

Questionnaires, Tests



Demographic information



Age, gender, profession, experience,…



Physical/mental abilities (e.g., dominant hand, dominant eye, color vision, stereo vision)



Subjective data



Preferences



Ratings using Likert scale



Free-form comments

Likert-scale question from Presence Questionnaire, B. Witmer & M. Singer, 1998

Stereo Fly Test, http://www.stereooptical.com Ishihara Color Test PseudoIsochromatic Plate (PIP) Test

Evaluation Methods

Questionnaires: NASA TLX

(Task Load Index)



Subjective workload assessment tool

https://hsi.arc.nasa.gov/groups/tlx/

 Six scales, normalized to 0–100  Scales are first weighted per subject and

task through 15 (= 6×5/2) binary comparisons of relative importance

 Each scale weighted 0–5 of 15 total

 Sum of weighted scales Is divided by 15

(5)

9

Evaluation Methods

Questionnaires:

igroup Presence Questionnaire



Subjective presence assessment tool

http://www.igroup.org/pq/ipq/index.php

One example of a questionnaire intended to measure presence–the subjective sense of being in an environment

Note: This is the English translation of a German questionnaire

Evaluation Methods

Interviews



Direct interaction by interviewer with subject



Structured  Semi-Structured  Unstructured



Structured: Fully standardized set of questions



Semi-structured: Based on a guide/framework, but

with the ability to explore and improvise



Unstructured: Open, with complete freedom to

(6)

11

Metrics for Evaluation

 Time to learn

 Time to use

 Implies benchmark task(s)  Errors

 How many?

 What kind?

 How important?  Skill retention

 For how long?

 Frequent vs. casual user  User impressions

 Does user like the system?

 Subjective impressions of the other factors

 Presence

 Comfort (e.g., cybersickness)

Metrics for Evaluation

 Objective measures of presence and

comfort

 Physiologic response

 Meehan et al. compared users’ responses to stressful and non-stressful virtual rooms

Heart rate change correlated well Skin conductance change correlated

less well

Skin temperature change not as

effective

(7)

13

Metrics for Evaluation

 Objective measures of 3D motion  Head motion Comparing user head position/ orientation when doing 18 sequential maintenance tasks whose documentation is presented on tracked HWD (AR) stationary LCD (LCD)

S. Henderson and S. Feiner. Exploring the

benefits of augmented reality documentation for maintenance and repair.

IEEE Transactions on Visualization and Computer Graphics, 17(10), 2011.

Evaluation Issues for 3DUI



Need to avoid evaluator intruding on

subject, affecting subject’s sense of

presence



Need to assist subject with unfamiliar

equipment

(8)

15

Evaluation Issues for 3DUI



Device limitations: trackers, displays



Device variations within class



E.g., different kinds of trackers, displays



May have a much greater effect than variations in

2D devices because of range of technologies, lack

of standardization

Case Study: Balloon Selection

Benko & Feiner, 3DUI 2007

(9)

18