Evaluation in Data and Information
Visualization
Universidade de Aveiro
Departamento de Electrónica, Telecomunicações e Informática
• Visualization is the process of exploring, transform and represent data as images (or other sensorial forms) to gain insight into phenomena
• There are several expressions used to designate different areas of Visualization:
– Scientific Visualization
– Data Visualization
– Information Visualization
• The differences among these areas are not completely clear
Data acquisition Data Computing Results User Hypothesis Understanding
Framework
(Brodlie et al., 1992)• Visualization includes not only image production from the data, but also their transformation and manipulation (if possible their acquisition)
Data Visualization Reference model
Measured data: CT, RMI, ultra-sound lasers, ……. Satalite imaging... Simulated dataFinite Element Analysis, …….
Numerical models ...
Data
Transform Map Display
(adapted from Schroeder et al., 2006)
• In general:
Data Visualization (DV) - Data having an inherent spatial structure (e.g., CAT, MR, geophysical, meteorological, fluid dynamics data)
Information Visualization (IV) – Data not having an inherent spatial structure
(e.g., stock exchange , S/W, Web usage patterns, text)
• These designations may be misleading since both DV and IV start with (raw) data and allow to extract information
• Borders between these areas are not well defined, neither it is clear if there is any advantage in separating them (Rhyne, 2003)
Information Visualization Reference Model
Raw data Data tables Visual structures Views task Data Transformation Visual Mappings View Transformation Human interactionVisualization can be described as the mapping of data to visual form that supports human interaction in a workspace for visual sense making
(Card et al., 1999)
• In Information Visualization interaction is generally more considered
• A correct definition of goal is fundamental
How can we evaluate a Visualization?
Answering two questions:
“How well does the final visualization: - represents the underlying phenomenon
- helps the user understand it?”
Which imply:
A) “Low level “- evaluating the representation of the phenomenon
A) “High level” – evaluating the users’ performance in their tasks (involving understanding the phenomenon) while using the visualization
• Evaluating a visualization technique should
Involve evaluation of all phases: e.g.
low level: accuracy, repeatability of methods (errors, artifacts, …) high level: efficacy and efficiency, in supporting users tasks
learnability, memorability, …
• Not forgetting the interaction (not only visual) aspects!
Visualization technique map transform display Data Simulated Data Measured Data
• Motivation/ goal (why? / what for?)
• Test data (which data sets? How many?)
• Evaluation methods (which?)
• Collected data (which measures? which observations?)
• Data analysis (which methods?)
Main Issues for evaluation planning
Much related with the methods
• Motivation and goal are the starting point of an evaluation For example:
- Which is the best representation of specific data to support specific users while performing specific tasks?
- Which is the best segmentation algorithm?
• Along constraints, influences the choice of
– methods
– data sets
• Test data can be real, synthetic (or in beteween)
• For instance in Medical Data Visualization it is common to use:
Synthetic data “Phantoms" Cadavers In Vivo
Synthetic data allow a better knowledge of the “ground truth”
• Data should :
– Be enough
– Be representative
– Include specially difficult cases
Accuracy
• Collected data have a fundamental impact on the information we can get from the evaluation
• Analysis of the collected data has an impact on the results credibility
• Selecting methods should take into consideration:
– Nature, level of representation and scale of the collected data
– Size of the sample
– Statistical distribution
• Methods from other disciplines can be adapted, e.g.
– methods used in Human-Computer-Interaction (Dix, 2004):
- Controlled experiments with users - Observation
- Query methods (questionnaires, interviews) - Inspection methods (heuristic evaluation)
• Specific methods are appearing (e.g. insight based methods)
Methods
Empirical (involving users)
Analytical
Controlled experiments
• “workhorse” of experimental science (Carpendale, 2008)
• with benchmark tasks, the primary method for rigorously evaluating visualizations (North, 2006)
• Involve:
– Hypothesis
– Independent (input) variables (what is controlled)
– Dependent (output) variables (what is measured)
– Secondary variables (what more could influence results)
– Experimental design (between groups / within groups)
– Protocol (sequence and characteristics of actions)
Observation
• Is a very useful method widely used in usability evaluation
• Can be done in different ways:
– Very simple (e.g. just observing the user doing some tasks) …
– Very sophisticated (e.g. using a usability Lab, logging, video, …)
– Think aloud
• Usability testing includes observation and query techniques (engineering approach)
Query methods
• Also very useful and widely used in usability evaluation
• Two types:
– Questionnaires – easier to apply to more people; less flexible
– Interviews – more flexible; reach less people
• Must be carefully designed (types of questions, scale of responses, …)
Heuristic evaluation
• Widely used in usability evaluation
• Application in Visualization evaluation is not as common (few heuristics)
• It is a structured analysis assessing if a set of heuristics are followed
• It should be performed by expert analysts
• Has the advantage of not involving users
Evaluating Visualizations: examples
Universidade de Aveiro
Departamento de Electrónica, Telecomunicações e Informática
CardioAnalyser: Left Ventricle (LV) Visualization from Angio Computer Tomography (CT) data
*
CardioAnalyser: Visualizing the Left Ventricle (LV) and
quantifying its performance from Angio Computer Tomography data
Goal:
Help users to better understand the
performance of the Left Ventricle from AngioCT through interactive visualization methods/tools
CardioAnalyser: Visualizing the Left Ventricle (LV) and
quantifying its performance from Angio Computer Tomography data
- CT exam: ~12 phases x (512x512x256) volume - segment endocardium and epicardium
in every phase - edit the segmentations (if necessary) - visualize – quantify
How should we evaluate? 1- the segmentation method/tool
2- the functional analysis method/tool 3- the perfusion analysis method/tool
CardioAnalyser: Visualizing the Left Ventricle (LV) and
quantifying its performance from Angio Computer Tomography data
How should we evaluate?
1- the segmentation method /tool 2- the functional analysis tool
CardioAnalyser: Visualizing the Left Ventricle (LV) and
“Low level evaluation”:
• Preliminary evaluating the segmentation method – observer study, query
“High level evaluation”:
• Evaluating a 3D segmentation editing tool - user study, observation, query
The team:
At the University:
- Samuel Silva, PhD student - Joaquim Madeira, PhD - Carlos Ferreira, PhD (Math) At Gaia Hospital:
Is the CardioAnalyser LV segmentation tool adequate to
support radiographers in their segmentation tasks?
1– qualitative evaluation of the segmentation method 2– qualitative evaluation of the 3D editing tool
3- selection of a measure to compare segmentations 4- quantitative evaluation of the LV segmentation tool
Constraints during evaluation
• A lot of data for each exam
• High patient/image variability
• Very busy domain experts
• Distant hospital
implied:
-
a careful choice of test data set and methods - the development of specific applications1- LV segmentation method
.
• Accurate segmentations are needed to: - compare structures
- perform quantitative measurements
• In medical applications segmentations must be validated by the expert
• A segmentation method that starts by one phase (60%) and uses the first segmentation to help segment the other phases was developed …
Qualitative Evaluation of the segmentation method
• Preliminary qualitative evaluation after developing the first prototype
• Meant to:
– detect serious segmentation problems
– inform further fine tuning of the method
• 3 radiographers • 7 exams, 3 phases /exam (ED, ES, 60%) epicardium, endocardium endocardium epicardium
• Using a Regional classification:
• Endocardium: four anatomical regions:
– apex
– mid-ventricle
– mitral valve
– outflow
• Epicardium: five anatomical regions:
– apex
– mid-ventricle lateral and septal regions
– basal lateral and septal regions
Scale:
- OK (optimum segmentation) - EXCESS (3 levels) + ++ +++ - SHORTAGE (3 levels) - --
---• Radiographers classified the segmentations (without any edition) as if they were final (i.e., usable for diagnosis purposes)
1 – low significance: very good; could include/exclude a very small region; Segmentation classification:
Results of preliminary evaluation of the segmentation method
• Endocardium segmentation:
– apex and midventricular slices well segmented
• Epicardium segmentation clearly needed further improvements
– Most problems in the septal sections of midventricular and basal regions
example of epicardium segmentation problem in the septal section
2- A tool to edit LV segmentations in 3D
• Even robust segmentation methods cannot deal with the wide range of variation of anatomical structures, e.g. in:
- shape
- orientation - texture
• Tools to easy segment editing/correction by experts are needed
• Performing segmentation in volume data editing several slices may be a tiresome task
• Should be:
- Intuitive
- Easy to use by radiographers
to correct most common segmentation problems
• Two alternatives:
Voxel mask (ADD/REMOVE) 3D surface (deform)
• Three radiographers
• Explanation and practice
• Two typical tasks:
– task 1 - adjusting the segmentation to the mitral valves (removing)
– task 2 - adjusting the segmentation to the LV wall (adding)
• Time to perform the tasks using:
– voxel mask (3DV)
– surface editing (3DS)
– the 2D editing tool (from MITK)
• Preferences , comments
Results of the 3D editing tool evaluation
Time (s) to complete an editing task using:
2D tool; 3DV - voxel editing; 3DS - surface editing
• Average task times for both 3D editing modes much smaller than for the 2D tool
• Users preferred voxel editing simplicity
- but surface editing does not occlude the image
Comparing a modified pedigree tree visualization method
with the original method
H-Tree method (Tuttle et al., 2010)
João Miguel Santos: MSc Student Paulo Dias, PhD
Comparing a modified pedigree tree visualization method with the
original method
• Visualization techniques capable of representing large pedigree trees are useful
• An H-Tree Layout has been recently proposed to overcome some of the limitations of traditional representations
Traditional representations of pedigree trees
(used in commercial S/W)• Binary trees with several layouts (horizontal, vertical, bow):
- Generations easily understandable
- Space needs grow fast with generations
• Fan trees
- Generations still understandable - Space needs attenuated
Pedigree H-layout representation
• To overcome space limitations, Tuttle et al. (2010) proposed a method based on the H-Tree Layout:
- It allows the representation of a greater number of generations simultaneously However:
- It is more difficult to identify relations among individuals
2 1 3 2 3 1 4 4 3 2 3 4 4 1 4 4 5 4 5 5 4 5 3 2 3 5 4 5 5 4 5 1 5 4 5 5 4 5 5 4 5 5 4 5 3 2 3 5 4 5 5 4 5 1 5 4 5 5 4 5
Enhancing the Pedigree H-layout
• Objectives:
- simplify the understanding of the family structure inherent in the pedigree - allow downward interactive navigation
Enhancing the Pedigree H-layout
• New functionality proposed:
- complementary information on the tooltip with the relation to the central individual
- "generation emphasis" that highlights individuals belonging to generation n in relation to the individual under the cursor
- contextual menu allowing downward navigation to direct descendants
Evaluating the Enhanced Pedigree H-Tree
• Does the enhanced method better support understanding the family structure? As (comparative evaluation)
• How good is the enhanced method (for specific tasks/users)? (outright evaluation)
• Two types:
– Analytical
Empirical evaluation characterization
• Data: public real data
• Users:
– InfoVis/HCI students
– Experts (MDs, animal breeders)
• Methods: – Observation – Logging – Questionnaire – Interview – Controlled experiment • Tasks: – Simple – Complex – Interaction – Visual • Measures: – User performance • Efficiency • Efficacy – Satisfaction
• Measures/methods: – Task completion: • Observation • Logging – Difficulty, Disorientation: • Questionnaire • Observation – Times: • Observation/Logging – Satisfaction: • Questionnaire • Interview
Evaluation: four/five phases
• Pilot usability test
– A few users
• Usability test
– 6 InfoVis students
• Pilot test for the controlled experiment:
– 6 InfoVis students
• Controlled experiment:
– 60 HCI students
• Evaluation with domain experts
For academic purposes: - further improvement - formal comparison - guidelines
- No logging
- Only comparative
- Informally confirmed usefulness of enhancements
- Allowed improving: - application
Usability test
(including pilot)• General explanation concerning the
application and the test
• Practice until each user feels ready
• Users performed 6 tasks
An observer registered:
• Task completion
• Correct answers
• Times
• Difficulty
• If the user asked for help/ seemed lost
• Users answered a questionnaire
Documents involved in the protocol
• List of tasks
• Observer notes
Results of the usability test
• Efficacy - more correct answers with:
– tooltips
– generation emphasis
• Efficiency - times were difficult to register manually (tasks too simple?)
• Tooltips were considered the most helpful feature to understand the family structure
• Specific suggestions (e.g. increase arrows size)
Another test: is the test application “Colorblind friendly?”
Other tested alternatives
Design of the controlled experiment
• Question:
Do users understand better the family structure while using the enhanced method (compared with the original method)?
• Can be divided in the following two hypothesis:
Hypothesis 1 – Tooltips improve users performance in understanding the family structure, when compared with the original method
Hypothesis 2 – “Generation emphasis” improves users performance in understanding the family structure, when compared with the original method
Variables:
• Input (independent) variables:
– Method – 3 levels – original
original + tooltips
original + “generation emphasis”
• Output (dependent) variables:
– times
– task completion rate; success rate
– disorientation, difficulty
– satisfaction
• Secondary variables:
Experimental design
• Within-groups:
all users perform the same tasks in all experimental conditions
(i.e., with all methods )
• Advantages over between-groups design:
– More data with the same users
– Less user profile variation
• Caution:
Protocol of the controlled experiment
• General explanation concerning the
application and the test
• Practice until each user feels ready
• Users perform 10 tasks
– An observer registers:
• Task completion
• Difficulty
• Errors
• If the user asked for help/ felt lost
– The application loggs times
• Users answer a questionnaire
In these examples (but more gerally):
• Formative came first, then summative evaluation (they are not totally disjoint)
• It was important to:
– Start thinking about evaluation as soon as possible
– Do several evaluation “rounds”
– Use more then one method
– Carefully choose the methods, data, users, tasks, measures, data analysis methods
– Learn as much as possible from each evaluation round, to: - Improve the methods/applicationsI
• Evaluating Visualizations is challenging
• It will become more challenging as Visualization evolves to be more interactive, collaborative, distributed, multi-sensorial, mobile …
• It is fundamental to:
- evaluate solutions to specific cases
- develop new visualization methods / systems - establish guidelines
to make Visualization more useful, more usable, and more used
About Evaluating Visualization methods/applications:
Bibliography - books
• Brodlie, K., L. Carpenter, R. Earnshaw, J. Gallop, R. Hubbold, A. Mumford, C. Osland, P. Quarendon, Scientific Visualization, Techniques and Applications, Springer Verlag , 1992 • Card, S., J. Mackinlay, B. Schneiderman (ed.), Readings in Information Visualization- Using
Vision to Think, Morgan Kaufmann, 1999
• Carpendale, S.: Evaluating Information Visualization. Information Visualization: Human-Centered Issues and Perspectives, Kerren, A. Stasko, J., Fekete, J.D., North, C. (eds), LNCS
vol. 4950 19-45. Springer, 2008
• Dix, A., Finlay, J., Abowd G., Beale, R.: Human-Computer Interaction, 3rd edition, Prentice Hall, 2004
• Hansen, C., C. Jonhson (eds.), The Visualization Handbook, Elsevier, 2005
• Jonhson, C., R. Moorhaed, T. Munzner, H. Pfister, P. Rheingans, T. Yoo, Visualization Research Challenges, NHI/NSF, January , 2006
• Keller, P., M. Keller, Visual Cues, IEEE Computer Society Press, 1993
• Schroeder, W., K. Martin, B. Lorensen, The Visulization Toolkit- An Object Oriented Approach to 3D Graphics, 4th ed., Prentice Hall, 2006
• Spence, R., Information Visualization: Design for Interaction, 2nd ed., Addison Wesley 2006 • Ware, C. , Information Visualization: Perception to Design, 2nd ed. Academic Press, 2004
Bibliography – papers
• Rhyne, T. M., "Does the Difference between Information and Scientific Visualization Really Matter?“, IEEE Computer Graphics and Applications, May/June, 2003, pp. 6-8
• Rhyne, T. M., “Scientific Visualization in the Next Millennium”, IEEE Computer Graphics and Applications, Jan./Feb., 2002, pp. 20-21
• Hibbard, B., “ Top Ten, Visualization Problems”, SIGGRAPH Computer Graphics Newsletter, VisFiles, May 1999, Vol. 33, N.2
• Johnson, C., “Top Scientific Visualization Research Problems”, IEEE Computer Graphics and Applications: Visualization Viewpoints, July/August, 2004, pp. 13-17
• Eick, S., "Information Visualization at 10," IEEE Computer Graphics and Applications, vol. 25, no. 1, Jan /Feb, 2005, pp. 12-14
• Keefe, D., “Integrating Visualization and Interaction Research to Improve Scientific Workflows”, IEEE Computer Graphics and Applications,
vol. 30, no. 2, Mar/April, 2010, pp. 8-13
• Globus, A., E. Raible, “Fourteen Ways to Say Nothing With Scientific Visualization”,