1: Spatial Statistical Methods Cartography and Visualizations:
1.9 Evaluation Design
There are several possible evaluation approaches to be considered. Usability measures such as task performance and task nature (e.g. closed or exploratory) are important factors. Wehrend and Lewis (1990) produced a classification list of operations which is designed to “distinguish problems for which the user’s goal in viewing the representation differs” (Wehrend and Lewis, 1990). The effectiveness to which a researcher can extract their required information from a visualisation can be regarded as a criterion, or a usability measurement. This section presents some of the evaluation approaches.
There are common challenges associated with experiments involving information visualisation tools. However, the location of an appropriate testing site is a problem associated with visualisation tools. It is argued that they need to be tested in a real setting, and not solely in a laboratory. Even if this is achieved the tester has to persuade the user to take the leap of faith, and use the tool. The fact a tool may not be specifically designed for the needs of a user is one major obstacle (Plaisant, 2004). Plaisant argues researchers should invest in resources to tailor their tools to specific user needs. This relates to the first experiment. Participants operate tools tailored for the experiment, and the dataset is not chosen by the participants. Also, it can be argued that the experiment laboratory would be appropriate if it replicated a natural working environment.
The previous sections have detailed the significance of Human Cognition in relation to a visualisation interaction. Cognitive aspects are important particularly in the context of information visualisation because they assist in the comprehension of visualisation performance. According to Chen and Yu (2000), “users with a stronger cognitive ability, i.e. high psychometrics, tend to perform better with information visualisation systems than users with weaker cognitive ability in terms of accuracy.” However, the work of Gibson (1979) and Schuman (1987) argue that human cognition is guided by their environment and not their
47 head. This indicates that the environment has a role to play in the results of any usability experiments, and as such must be taken into account. Different methods of analysis have been proposed. For example, a person should be observed in a naturalistic setting where they would make perceptual judgements according to Brunswick (1943). Lave and Wegner (1991) have also deemed work in a naturalistic setting appropriate.
When assessing data recorded during an evaluation experiment the technique to measure this data should be appropriate. A correctness of response system was used by Koua et al. (2006), where the correct answer is coded one and the incorrect answer is coded zero. This could be a useful method if coupled with the interaction logs, it would be easier to analyse the experiment results. Time can be used as a measurement of performance, and can reveal some important differences in experiments as Koua et al. (2006) discovered. This method of recording correctness of user tasks could be applied to the experiments carried out in this thesis. However, given the complexity of the tasks in the first experiment this response system should be adapted.
Generally empirical evaluations only include simple tasks; for example, identify and locate tasks. Performance reports on a task by task basis are the preferred method, compared to overall performance reports. This allows for in-depth analysis of performance. Tools can be matched with particular tasks. This is something that could be achieved through evaluations of visualisations in this thesis to an extent. The second experiment involving eye tracking technology would work most effectively with these types of tasks.
Given the nature of the GWR outputs in the first experiment the inclusion of a set of exploratory tasks would represent a more accurate testing platform. The evaluation methods used by Koua et al. (2006) emphasised exploratory tasks and knowledge discovery support. The authors presented an approach for assessing the usability and usefulness of the visual- computational analysis environment. It is important to understand user cognition and how users make interpretations and inferences when conducting analysis. Crucial to this is the choice of an appropriate visualisation metaphor. The evaluation results serve as a guideline for the design of geovisualisation tools which integrate several visualisation types simultaneously. It is evident from the evaluations conducted by Koua et al. (2006) that certain visualisations perform better for each task. This suggests a multiple representation environment would be best for analysis of varying spatial and temporal data. It could also be suggested that any experiment including multi-windowed representations would be
48 augmented by comparative single windowed representations. A more comprehensive set of results could be obtained for analysis.
Gabbard et al. (1999) have produced methods for virtual environment (VE) usability engineering. Their methodology steps includes; user task analysis, expert guidelines based evaluation, formative user-centric evaluation and summative comparative evaluations. The user task analysis is the process of identifying a complete description of tasks, subtasks and methods required to use a system, as well as other resources necessary (Gabbard et al., 1999). The user task analysis represents insights gained through organisational and social workflow, and a general understanding of the needs of the user. This step of user analysis was often overlooked in the 1990s. Developers and evaluators operate on a ‘best guess’ to interpret how VE applications should be designed. It highlights the need for a more user inclusive approach to visualisation design.
Heuristic evaluations or usability inspections are capable of identifying potential problems by measuring established design guidelines to an applications user interaction design. More than one person should perform these evaluations since it is unlikely that one person will identify all of the problems in an application. Evaluators inspect the application individually and the results are then combined. Gabbard et al. (1999) have found the expert guidelines to be too general, and have designed a set of guidelines specifically for VEs within a framework of usability characteristics.
The expert-guideline-based evaluations are critical to the effectiveness of formal and summative evaluations (Gabbard et al. 1999). They are useful for streamlining the two latter evaluation types. Time is not wasted on identifying obvious usability problems.
According to Andrienko and Andrienko (2005) geographical analysis is considered to be a set of operations or tasks to achieve data exploration aims. The process involves the identification of different element spatial proximities, verifying spatial density, and obtaining a perspective of a targets measurement at a location or neighbouring location (Koua et al., 2006).
49 Weldon (1996) describes operations in a more specific manner, the tasks include:
- Identification of clusters in data, and relationships between elements.
- Comparison of values at different spatial locations, distinguishing a range of values.
- Relation of the value, shape and position of identified object.
- Analysis of extracted relevant information.
There are several taxonomies for visualisation that have been suggested by authors (Zhou and Feiner, 1998; Ogao and Kraak, 2002). An earlier taxonomy for visualisation operations can be obtained from reading Wehrend and Lewis (1990). It is presented in Table 1.5. Table 1.6 shows a more modern taxonomy that condenses the operation keywords (Ogao & Kraak, 2002). However, this taxonomy can be streamlined further for the purposes of the experiments in this thesis, and Table 1.7 shows an adapted version.
In the first experiment, for Task 1, participants identified a particular Electoral Division. However for Task 2 and Task 3 an additional appropriate operation class ‘relationship’ within each task was added to Table 1.19 below. These operations aided the creation of a series of tasks to represent the types of questions a user of GWR would want to answer when analysing a complex data set.
Table 1.5 Taxonomy of Visualisation Operations Operation Operation Description
Identify
to establish characteristics of a piece of data recognise or select by analysis.
Locate to discover the extract place or position of data of object. Distinguish to discern, to identify characteristic differences.
Categorise to order/split or arrange a group (of data in this case).
Cluster a group of similar things positioned or occurring closely together. Distribution the way in which something is dispersed, diffused or spread. Rank a position within a fixed hierarchy.
Compare to estimate, measure, note similarity or dissimilarity. Associate a logical or casual connection between two things.
50 Correlate have or bring into a relationship in which one thing affects or depends
on another.
Determine to ascertain, to decide by research or calculation. Look up value similar to identify according to Roth and Mattis (1990).
Source: Reproduced from Wehrend and Lewis (1990).
Table 1.6 Four Generic visualisation operations (reproduced from Ogao and Kraak, 2002). Visualization
operator
Visualization sub-operator Example of definitive characteristics of results.
Spatial identification
Length, area, irregularity, minimum, maximum range, distance, pattern of distribution Identify
Temporal identification Extent: longest, shortest; sequence: first, last; category: nominal, ordinal, interval/ratio; movement: continuous, cyclical, intermittent.
Thematic identification Name, symbols (legend)
Locate Spatial Location (x,y), , grid locations, (rows, columns), near, within, between, above, below, neighbourhood of. Temporal Location Event, valid time t, observed
interval (t1-t2) before, after, together, next.
Associate/ compare Spatial association/ comparison Topological relations, spatial collection, covariance, correlation Temporal association/ Temporal
comparison
Temporal relations, time between objects, orientation (before, after), adjacency (just before, just after), causality (correlation).
*Relationship Relationships between two
51 Table 1.7 Adapted Taxonomy of Visualisation Operations
Operation Operation Description
Identify to establish characteristics of a piece of data recognise or select by analysis.
Locate to discover the extract place or position of data of object. *Relationship to discover the type of interaction between two variables.
While designing tasks, it is important to prevent any overlapping of content to minimise the possibility of a learning effect occurring. The learning effect experienced by participants who recall information from memory as they become familiar with the layout and content of an experiment had to be minimised. This familiarity could alter results (Robinson and Griffin, 2010). The order of the experiment tasks example - from the Latin square technique by adapting its concept of randomisation to ensure GWR tasks, some of which are complex, would not be repeated. The Latin squares technique (Dénes & Keedwell, 1974) is a matrix of letters or numbers used to prevent experiment participants from learning the details of an experiment.
Since this experiment contains GWR related tasks, i.e. bivariate and multivariate tasks it complicates our efforts to use a textbook Latin square technique for the experiment. A standard Latin Squares task randomisation sequence of ABC CAB BCA is ideal for univariate tasks. However, given the number of tasks and the fact that they were also bivariate and multivariate in nature, this was not possible. As you can see, the technique is designed to prevent a task from recurring. In the case of the GWR experiment there are five parameter estimates to account for and their respective T-values were used as an additional complexity to simulate typical GWR task analysis. Tasks could be arranged in a sequence that helps to prevent the learning effect whilst providing the necessary complexity required. In some cases, more than one task contained the same attributes but participants would be asked to look for highest or lowest value relationships so tasks would still remain different. In other words, no two tasks were the exact same.
52 The next chapter will address in greater depth the human-related aspects that are important to consider for effective visualisation of data. It should be noted that the key goal of this research was not to design a visualisation, but rather to compare and test the utility of visualisations. The first experiment tests these for GWR, and the second experiment tests if human perception is affected by scalability. It was found that participant performances partly reflected familiarity with and ease of use of the visualisation tools, as well as their understanding of GWR and GWR outputs.