Multimodal Documents and their Components
2.3 The page as object of perception
In this section, we move away from the informed understanding of what a document is trying to communicate that we commonly see in interpretative approaches to multimodal documents and try to intervene in this process at an earlier stage. In order to be an artefact that can provide information at all, a page must first be perceived by its readers: that is, readers must interact with the page as an object of perception in their visual field. Whereas the approaches of the previous section took the perceived page for granted, here we are concerned with just how that process works—how do we get to the state of ‘having seen’ a page?
This question makes the entire field of visual perception relevant to is-sues of design and page interpretation. From this perspective, we focus on precisely how readers obtain ‘information’ from the page through their per-ceptual systems. Many cognitive models have been proposed which are in-tended to reproduce the kind of processing, and processing decisions, that are observable in humans when decomposing a page into its component parts. We can see this analogously to auditory phonetics, in which subjects can be asked whether they perceive two sounds as distinct or not, or two words pronounced differently, as different or not. For the page, we can con-sider whether particular elements are seen to be similar, whether they are grouped together, whether they are perceived as salient (e.g., seen ‘first’) or not, and so on. The principles of perception are accordingly well known in professional design. If elements are to be made prominent, then this can only be done within the bounds of the human visual system; similarly, if el-ements are intended to be grouped together then their human readers have to be able to perceive this. Visual perception and design go hand in hand.
A further rich source of data about the perception of documents is in-herited from psycholinguistic studies of reading comprehension. This has led to detailed and, by now, established design recommendations involv-ing preferred line-lengths, sizes of fonts, contrast of fonts and background, and much more. Comprehension studies have also made it clear that it is beneficial for design to strive for both content and organisation compati-bility between information in memory and the information presented in a document (cf. Wright 1980, p186). There are also processing differences to be observed depending on the delivery channel of the document: for ex-ample, whether a document is presented on paper or on a computer screen makes substantial differences for effective design (e.g. Dyson 2004). All of
these contributing factors need to be considered when predicting how read-ers will interact with multimodal documents. However, in this section, we focus on just one particular aspect of cognitive processing—that of visual perception—since this is central for our goal of decomposing documents into their parts.
Gestalt perception
A good example of the relevance of perception for design is Schriver’s (1997, p314) analysis of a layout spread (i.e., two pages functioning as a single visual unit) for a multilingual instruction text for a stereo. The ex-amples relevant to the discussion are shown in schematic form in Figure 2.6 where we can see two versions of a possible layout. In the version on the right, a perceptual principle known as good continuation has been violated in order to show how this has a significant effect on the manner in which a page as a whole is perceived. Good continuation in a page layout causes the visual system to perceive spatially distinct elements as being connected, or even as being the same element occluded by something else that has got in the way.
We see this in Figure 2.6 as follows. The example page as a whole con-sists of one large graphic of the stereo running across the centre of the page; this graphic does not include any language-specific information and simply illustrates the stereo and the locations of actions to be performed on it. The four vertical blocks on the page then provide the detailed textual in-structions related to the stereo in each language covered. The designer has intended each column to be grouped as a single continuous sequence of in-structions running from the top of the page to the bottom. The inin-structions within each column then refer to the stereo shown in the central graphic by means of numbers labelling the components shown: this has become a common genre for multilingual instructions since the design and layout can remained ‘fixed’ and different language versions can be slotted in as required.
This intended grouping is favoured far more by the layout on the left, in which alignment of the text blocks’ left edges provides ‘good continua-tion’, than on the right, where there is no alignment. The lack of contin-uation means that perceptually there is a much reduced tendency to group the components together. In the latter case, a reader would have to look for further clues that the lower half rectangles are connected with their corre-sponding upper half rectangles, perhaps by virtue of continued enumerated lists or by the choice of language. This reiterates the important point that we have now seen at several places in the discussion: as with both the
Figure 2.6 Good continuation (left) and bad continuation (right); an example adapted from Schriver (1997)
Canada history book page that we saw above in Figure 2.5 and the gas bill of the introduction, although a reader may work out that the lower halves continue their upper halves, there is little in the layout itself supporting this perceptually.
Principles of perception therefore play a significant role for layout and layout decomposition. When some larger scale element, such as a text col-umn, is temporarily interrupted by some other element, such as an embed-ded headline or a picture, appropriate continuation can nevertheless com-municate directly that there is a single document ‘part’ at issue. Readers generally take this in their stride and are more likely to attribute the inter-ruption to information layering in space: that is, one element can stand ‘in front of’ another. This property cannot be predicted without knowledge of the human perceptual system: it is not simply a deduction about how a reader might ‘work out’ how the page is to be seen.6
The area of perceptual psychology that has contributed most to our un-derstanding of these aspects of visual perception and their relevance for de-sign is Gestalt Psychology (Koffka 1935, K¨ohler 1947). The Gestalt school psychologists developed several ‘laws’ of pattern perception which appear to be implemented by the human visual system. Demonstrations that the laws hold are found in enumerable and completely reliable examples of visual perception or misperception and underlie most well-known ‘visual illusions’. The seven essential Gestalt laws are shown in Table 2.1.
In most cases, these laws combine during perception to strongly direct interpretation in one direction rather than another. The case of ‘good conti-nuity’ above with the stereo instructions brings together at least continuity, similarity, and closure; the Canada book page in turn combines at least prox-imity and closure. More information and many visual examples
demonstrat-6In this respect, claims such as those of Ittelson (1996) that processing markings on sur-faces is completely different to processing real-world scenes are probably overstated: there are clearly differences, but overlaps also.
ing the effectiveness of these laws are given by, for example, Ware (2000, pp203–213).7
Pre-attentive perception
There are also additional features of perception that can play a significant role during layout perception. Certain objects are well known to attract attention either due to their intrinsic form (such as human faces) or due to perceptual properties (such as a distinctive bright colour). All of these can be used in layout and document design in order to direct perception and decomposition. For understanding this aspect of page perception more exactly, it is useful to know about attributes of the visual field which are pre-attentive: i.e., that are distinguished at very early stages of perception before higher-level processing has starting identifying and classifying what is being seen. Pre-attentive distinctions are available as direct perceptions and so naturally provide the strongest layout cues of all. If, for example, something is bigger than something else, or of a different colour, or is sim-ilar to a set of other objects with one small addition, then this grouping is part of how the visual field is seen and does not need to be ‘interpreted’ by the viewer.
Something of the force of these pre-attentive discriminations can be seen in the examples shown in Figure 2.7. Here, in all but the last two boxes (labelled ‘juncture’ and ‘parallelism’), we see examples of pre-attentive processing at work. Look, for example, at the upper left-hand box, ‘ori-entation’. What we directly see when looking at this box is a collection of vertical lines and one that is not vertical. We do not have to examine the in-dividual lines and ask ourselves which ones are vertical and which ones are not—this information is already part of how the image is perceived; that is, the information that there is one line whose orientation differs from the others is directly and unavoidably present in our perception. The remaining boxes illustrate this phenomenon for other aspects of visual processing that are also available pre-attentively. Only when we come to the last two boxes do we have to actually scan the image carefully in order to discover that, in the first case, there are two lines which do not meet and, in the second case, there are two lines that are parallel.
We also have pre-attentive access to, for example, comparative line widths and lengths, co-linearity (alignment), colour and spatial position
7A nice online collection of examples of ‘named’ visual illusions, complete with original references, has been compiled by John Andraos (2003) available at:
http://www.careerchem.com/NAMED/Optical-Illusions.pdf (last accessed January 1, 2008).
Proximity Objects which are closer together are generally assumed to belong together.
Similarity Entities in the visual field which are perceptually simi-lar are assumed to be grouped together.
Continuity Entities in the visual field will be built out of perceived parts assuming continuity and smoothness rather than sudden changes of direction. This is one way in which
‘connectedness’ is constructed; connectedness itself is, however, an extremely strong perceptual principle for assigning grouping.
Symmetry Shapes which are symmetrical about an axis provide a much stronger impression of a single, contained object than when the boundaries are not symmetrical.
Closure Whenever a recognisable contour in the visual field is closed, i.e., forms itself into a continuous loop, then this is generally seen as dividing an ‘inside’ from an ‘out-side’ and the inside is perceived as an object. Without further information, it is then difficult to see the ‘out-side’ as an object—e.g., as an object with a hole. More-over, the perceptual system will generally assume con-tinuity and closure: thus, if there is a partially obscured continuous shape, then the perceptual system will as-sume that the shape is closed and the unseen part of the shape is simply ‘behind’ some obscuring object, not that there is a gap in the shape.
Relative Size When sharing a common area of the visual field, smaller areas are generally seen as objects placed in front of larger areas, which are seen as the background.
Figure-Ground Somewhat similar to the last ‘law’, but rather more gen-eral in application, perception gengen-erally divides percep-tions into elements (figures) which are picked out from some background (ground). Several visual illusions work with this principle: i.e., under unusual circum-stances, just what is taken as figure and what is taken as ground can be reversed: e.g., the well known ‘Ru-bin’s Vase’ illusion (Rubin 1921), where either a vase is seen against a background or two silhouettes face one another. In one case, the figure is a vase and in the other the figure is two faces.
Table 2.1 The primary Gestalt laws of perception
Figure 2.7 Examples of pre-attentive perception at work. Taken from Ware (2000, p166; Figure 5.5); used by permission of Elsevier
itself, as well as to several motion-dependent features of the visual field, which are not so relevant for us at present, such as direction and flicker (cf.
Ware 2000, pp165/6). An important consequence of the existence of such features is that designs need to be sensitive to the fact that certain features of a layout will be ‘accessed’ by readers regardless of intention—whenever a layout employs pre-attentively available visual features, readers will register these (not necessarily consciously) and build them into their interpretations of relations on the page. Visual presentations that make use of pre-attentive features will directly indicate particular decompositions of layout parts rather than others. Effective design must ensure that whatever interpretations readers are led to are ones that support the intended interpretations of the page rather than undermining them.
Figure 2.8 Eye-tracking results from the Holsanova et al. (2006) study; used by per-mission of the author. The tracks show the points of fixation during the first minute of exposure to the spread. We also see the paths between fixation points and, indicated by the relative size of circles, the duration of the fixa-tion. The large filled circle lower right in each image shows for reference what the size of a one-second fixation would be.
Reading paths
Visual perception is also by no means a passive affair of simply being sub-jected to patterns revealed by pre-attentive processing. It is now well es-tablished that vision is a process of active construction on the part of the reader. The eyes gather information necessary for constructing a visual in-terpretation by rapidly moving between points of fixation; moreover, those points are themselves selected by the perceptual system so as to gather information effectively for constructing hypotheses. That is, if the visual scene is still unresolved concerning what it is showing, the eyes are di-rected towards points that pre-attentive processing predicts will contribute maximally to the disambiguation of interpretations.
Nowadays we can readily observe this process of selecting points of fix-ation by employing eye-tracking technology. Devices of this kind produce a record of precisely where readers’ eyes are directed while they process a document, noting points of fixation in sequence so that these can be anal-ysed. This kind of research is gaining in significance because it can be used to determine actual reading paths—i.e., the reading paths that are gen-uinely used by users of a document rather than what a designer may have predicted or what users might themselves report. It also shows relations be-tween elements—i.e., the paths that readers follow around a page from ele-ment to eleele-ment. This gives a useful method for exploring both traditional views of prominence in visual images and more recent semiotic predictions of salience such as those described above by Kress and van Leeuwen.
A good example of what occurs during such an experiment is shown in Figure 2.8, taken from a study reported by Holsanova et al. (2006). This study is additionally relevant for us here because it also has the goal of empirically evaluating some of the proposals made in semiotic and linguistically-informed analyses of documents. In particular, the study examined whether evidence could be found for Kress and van Leeuwen’s claims concerning reading order, prominence and framing for newspaper layouts. To do this, the principles and possibilities set out by Kress and van Leeuwen were translated into a collection of assumptions concerning reading behaviour. The actual reading behaviour of experimental participants was then investigated using eye-tracking methods.
The results obtained were mixed. For example, the classification of the right-hand side of the page to new, salient information (cf. Figure 2.3) was translated into a reading path assumption that this location in the spread would be attended to early in the reading process: this was not the case;
however, an assumption that information would be sought higher on the page before examining lower positioned information was confirmed. Head-lines and pictures attracted attention as expected and a certain reliance on framing could also be seen. There was no evidence that readers scanned the entire page or visual field before making decisions concerning what to read.
Reading paths are also now known to be susceptible both to the knowl-edge that a reader has concerning how a visual image/page is to be inter-preted and to the purpose the reader has in using the image/page. Another of Holsanova et al.’s results was that advertisements (pre-attentively avail-able due to colour and texture differences) were often not on the reading path despite theoretical ‘salience’: readers have evidently already learnt to ignore them—which is certainly not what advertisers will be wanting to hear. The relationship between reading behaviour, visual prominence and semiotic value is clearly complex and there is a pressing need for further investigations of this kind. When we add into the layout mixture graphics with clear ‘eye-flow’ qualities of their own, salient size, colour and font decisions, framing of distinct functional elements (such as, e.g., navigation bars and content areas; cf. Naumann, Waniek and Krems 2001), connecting lines and so on, it becomes very difficult to predict precisely how readers will interact with the information on offer.
More accurate description of how readers actually scan pages can con-tribute significantly to our accounts of interaction with multimodal pages.
Discussions of reading paths in semiotic and linguistic approaches to multi-modality are still too often unclear concerning the methods by which paths are ascertained (e.g., O’Halloran 1999a, Guo 2004). Since it is perfectly
possible even for designers to guess wrong (Schriver 1997, p317), intu-itive analysis is certainly inadequate. When subjected to validation by eye-tracking, many sources of traditional wisdom are currently being refined or relativised to particular types of documents, types of design and cir-cumstances of use. For example, readers from cultures with a left-to-right writing system are generally considered to use what is in design called the Z-pattern, moving right from the optical centre, back to the left edge and then again right towards the lower right-hand corner. This is the pattern ex-pected when there are no sources of distraction/attraction on the page. More recent studies of web-pages suggest, however, that web-page readers now adopt a quite different pattern; Nielsen (2006) in particular presents results where the areas readers actually see on web-pages are markedly different to those that simple evaluations of prominence or newsworthiness might
possible even for designers to guess wrong (Schriver 1997, p317), intu-itive analysis is certainly inadequate. When subjected to validation by eye-tracking, many sources of traditional wisdom are currently being refined or relativised to particular types of documents, types of design and cir-cumstances of use. For example, readers from cultures with a left-to-right writing system are generally considered to use what is in design called the Z-pattern, moving right from the optical centre, back to the left edge and then again right towards the lower right-hand corner. This is the pattern ex-pected when there are no sources of distraction/attraction on the page. More recent studies of web-pages suggest, however, that web-page readers now adopt a quite different pattern; Nielsen (2006) in particular presents results where the areas readers actually see on web-pages are markedly different to those that simple evaluations of prominence or newsworthiness might