These three different emphases (causes,outcomesand design principles) point to an important observation: that the term “heuristic” is used in several different, and potentially contradictory ways. This is supported by the observations from Doubleday et al. (1997) who notice a lack of “evenness” in heuristics. When considering Nielsen’s canonical set, they comment that some are simple and precise, yet others are “imprecise and difficult to check for completeness”. This thesis goes further by identifying three distinct types of heuristics seen in the 146 used inChapter 4(Testing Heuristic Evaluation for Video Games), and defining novel terminology to distinguish their use:
• Design principles. • Abstract reflection. • Outcome analysis.
5.4.1 Heuristics as Design Principles
The derivation of heuristics as design principles may be useful for distinguishing between games ranked high or low by professional or consumer opinion (Desurvire and Wiberg,2009; Pinelle et al.,2008a). As such they may also have applicability asformativeguidelines to assist designers during pre-production. For example, Heuristic 81: “Provide consistent responses to the users actions”. However, these heuristics were not validated so have limited applicability for use as evaluation tools. Furthermore, Grudin (1989) points out that abstract design principles such as “consistency” lack actionable definitions with which to guide development and to differentiate between good and bad cases. Korhonen et al. (2009) likewise reflect on heuristic specificity with similar concerns regarding inappropriate abstraction levels.
Polson et al. (1992) argue against the use of design guidelines such as “minimize working memory load” by pointing out that no means to measure working memory is specified, nor are solutions proposed which could ameliorate the problem. Similarly they consider heuristics such as “Use simple and natural dialog” less useful than acognitive walkthroughanalysis which can guide evaluators in understanding why a problem has occurred and how to remedy it. They conclude that such simplified guidelines have little to contribute to complex interactions.
5.4.2 Heuristics for Abstract Reflection
Nielsen’s use of the term “heuristic” does has some applicability in formativeandsummative evaluation contexts. However, these types of heuristic feature the most abstract phrasing, referring to general areas for the evaluator to consider but without defining specific criteria for violation of conformance. For example,Heuristic 146: “Visibility of System Status”. These abstract reflective forms mirror the way in which they were created throughprincipal compo- nents analysis, a dimension reduction technique which was used to reveal implicit similarities among 101 different heuristics, and to reduce them to a more abstract set of 10. This high level of abstraction means that they still suffer from ambiguous specificity and weak inter-rater reliabilitywhen used as evaluation tools.
5.4.3 Heuristics for Outcome Analysis
Other heuristics validated against user testing may be more specific, with clear criteria for violation or conformance, and hence exhibit greaterinter-rater reliability. For example,Heuris- tic 142: “There should be variable difficulty level”. This form of phrasing is particularly suitable for analyticaloutcomebased evaluation, especially in respect of the standard usability aspects of Effectiveness, Efficiency and Satisfaction. However, they still do not address how these three criteria are affected by particular design decisions, so contribute little to design knowledge about causes of problems.
5.4.4 Analytical Heuristics Provide More Specific Evaluator Resources
Many of the heuristics considered take the analytic form, which make them readily available for decomposition intobreakdownsandoutcomes. Those heuristics which take more of a design principle form, or even more so for the abstract/reflective type, are considerably more general and ambiguous. They map less clearly to specific, observablebreakdownsandoutcomes.
Consider, for example, Heuristic 78: “Players should be given context sensitive help while playing so that they are not stuck and need to rely on a manual for help”. It would be difficult to argue against this principle in the general form: “players should NOT be given context sensitive help while playing so that they ARE stuck and need to rely on a manual for help”1. So the question remains, how to operationalise this heuristic? It would be excessive to expect all contexts to provide unique help. Alternatively, it is unlikely that a game could dynamically detect when the player is genuinely stuck, and then provide context sensitive help for that specific issue. Furthermore, there are myriad reasons why a player could become stuck. This heuristic does not deal with detecting and resolving these underlying problems per se, but rather proposes a workaround, assuming that the problem may occur without considering why. In terms of analysis and discovery resources then, this heuristic only addresses discovery in terms of observable consequences of an actual user test. It has little to contribute in terms of discovery for a prediction of when a problem could occur. As it does not describe any underlying causes which would trigger the observable outcomes it can only be used to discover the outcomes of a problem that has already occurred. Thus, this heuristic also provides little in the way of analysis resources. As a result this means that it becomes difficult to explicitly define the analytical Breakdowncriteria for the heuristic’s violation. However, it is clear what the Outcome criteria will be, particularly for Efficiency and Satisfaction, and also what other kind of observable events can be used to indicate violation (e.g., the player tries to find help in the game manual).
In several cases, even heuristics which take the more empirically measurable analytic form still exhibited low reliability in the heuristic evaluation. For example, Heuristic 36: “Game provides feedback and reacts in a consistent, immediate, challenging and exciting way to the players’ actions”. This is a good example of a heuristic with many disparate component criteria. This introduces the potential for disagreements amongst evaluators when deciding whether the 1Clearly challenge is an important part offirst-person shooter games, but being “stuck” is more likely to be
experienced by the player as excessive or inappropriate challenge, with negative outcomes for Efficiency and Satisfaction.
heuristic as a whole has been violated based on a subset of criteria violations. For example, one evaluator may feel that the heuristic has been violated due to the game being too easy (a violation of the criterion “Game ... reacts in a ... challenging ... way to the players’ actions”). Another evaluator may feel that the level of challenge was appropriate, but that the heuristic is violated due to some inconsistency in the game, entirely separate to the issue of challenge. In these cases the heuristic may be rated as often being violated, but with a rather low rating. Nielsen’s heuristics were a good example of these kinds of ratings, high frequency but low specificity.
Restructuring Design and Evaluation Knowledge
Rather than derive new heuristics, this current chapter instead focuses on operationalising the existing ones by requiring evaluators to rate the criteria which constitute the heuristics, rather than the composite heuristic itself. In particularOutcomesandBreakdownsare separated to assist the evaluator in understanding what has occurred, how the problem may have been caused, and the resulting effect it produced. This approach has a similar aim to that proposed by Matera et al. (2002), but the particular techniques involved are more transparent. In Mat- era’s approach, “Abstract Tasks” were informally derived from experts’ opinion about how to conduct an evaluation. They consist of concrete steps that a novice evaluator can conduct, and a complete set of Abstract Tasks defines a systematic procedure for evaluating a complete system. The framework presented in this current thesis makes use of existing heuristics from the literature as the source of points for the evaluator to consider, similar to Matera’s way of using expert opinion as the source from which Abstract Tasks are derived. From these existing heuristics are derived explicit, concrete means to measure conformance or violation