• No results found

Interpretation within document design

Multimodal Documents and their Components

2.2 The page as an object of interpretation

2.2.1 Interpretation within document design

We will make two points of contact with document design in this chapter—

the first here, focusing on issues of document interpretation for the purposes of improving design, and the second below when we come to consider doc-ument production proper in Section 2.5.1. In a sense this reflects the need within design to consider both interpretation and production as interlocking aspects of a single task. Design is best seen as a cycle of design proposals producing draft documents which are then subjected to criticism leading back to improved proposals. Thus we necessarily encounter both interpre-tation and production and writers describing the design process often move back and forth across the two perspectives, sometimes barely distinguishing them.

We believe that some of the techniques that we develop here can usefully feed into this cyclic process of design–critique–revision. But, to date, very little use has been made in design of the rather explicit kinds of analytic procedures followed in the linguistic or automatic analysis approaches that we will see below. This is generally a purely practical issue: developing and employing the reflexive meta-awareness that comes from theory and explicit analysis is rarely compatible with the day-to-day demands of meet-ing design deadlines. Where we do see clear overlaps in concern is in areas of design concerned with research and teaching. For here it is clear that a deeper theoretical apparatus can bring benefits, as long as it is also always embedded within and founded on practice. All of the approaches that we describe in this section are therefore positioned somewhere between practi-cal description and theoretipracti-cal consideration.

The Reading School

One body of work relevant here that was, from the perspective we take in this book, considerably ahead of its time is that of the Department of Ty-pography and Communication at the University of Reading. Research in this tradition continues to this day to explore issues very much related to those we pursue here. Some basic assumptions underlying all of this work are set out by Twyman (1982), where the need is emphasised for a far more detailed theoretical and empirically motivated grasp of the visual presenta-tion of ‘language’—taking in all the graphic and typographical possibilities offered by current, previous and future technology:

“we should engage in a serious study of verbal graphic language in much the same way as linguistic scientists have studied spoken language. We need to know much more about the language that we use and the circumstances of use.

. . . It seems to me self-evident that only when we know what the characteristics of verbal graphic language are can we begin to design effectively for it.”

(Twyman 1982, pp16–17) This has led to a series of broad surveys of documents and the typographic and graphical resources deployed in those documents, the aim of which has been to discover how the possibilities of visual graphic language vary across time and context of use. As a research task, this comes very close to the task that we take up again in detail in Chapter 5, when we turn to multimodal genre. For the purposes of the present chapter, we draw par-ticularly on the analytic frameworks that were developed in these surveys for decomposing documents analytically into their parts. Two orientations were particularly significant for this work. One is the weight given to the production of documents by laypersons—i.e., design by the non-expert or non-professional. The other is the role of changing technology bases for document production.

Non-professional design is considered significant because it is here we come the closest to ‘natural’ or ‘spontaneous’ verbal graphic language pro-duction. Just as the linguist might want to study spontaneous, unedited speech to get a sense of how the language ‘really’ is, i.e., before it is chan-nelled into particular preregulated—and thus perhaps artificial—shapes and forms, the researcher into verbal graphic language can try to ascertain just what the implicit knowledge of graphical/visual organisation in a society is.

Walker (e.g., Walker 1982 and Walker 2001) accordingly includes surveys of letters, both handwritten and typed, as examples of lay design that can reveal the norms and conventions of visual language.

This orientation is also linked to technology because it is only quite recently that the full range of multimodal document design possibilities

has become available for the layperson. As a consequence we now see a much broader range of documents being produced by those untrained in document design. Discovering the decisions made in such documents presents an exciting and very new area of research that will need to draw on a range of research methodologies. For example, psychological experimentation into the kinds of spontaneous decisions made by untrained ‘information-givers’ when presenting information multimodally is only just beginning (e.g., van Hooijdonk, Krahmer, Maes, Theune and Bosma 2006) but already promises to take our understanding of

‘spontaneous multimodal utterances’, and consequently of multimodal meaning-making, significantly further.

The influence of technology was also a prominent issue in several of the surveys carried out by the Reading group for other reasons. One was a parallel interest in the historical development of verbal graphic language as document production has changed over time and another was specifically commissioned analyses of the potential (or otherwise) of emerging online information presentation systems for information dissemination. The sys-tems analysed included services where videotext is transmitted along with television signals for display on regular TV screens (e.g., Ceefax, Prestel and similar offerings) and were, at that time, severely limited with respect to their typographic and multimodal possibilities. The research question was then one of considering just how information could be presented with such a restricted graphical mode. This was termed the ‘graphic translatability’

of text (cf. Twyman 1982, p21 and Norrish 1987): that is, to what extent could graphically sophisticated versions of some document be presented on systems of reduced capabilities in such a way that their information-content and organisation would nevertheless be recoverable by a reader.

Although the battle lines in this area of investigation have moved almost beyond recognition over the past twenty years in that the overwhelming majority of the technical restrictions of those early online text presentation systems have disappeared,2the analytic tasks that had to be carried out are just as important and relevant today. In order to characterise just what may or may not need to be preserved when changing representation form, it is advisable to take stock of what is being used in the first place. With this in mind both Twyman (1982) and Walker (1982) present components of a framework for documenting the verbal graphic language employed in doc-uments in a way that was independent of whether the docdoc-uments analysed were produced as traditional printed documents, handwritten, typewritten, videotext and so on. This common form of description could then support

2Which does not mean that the systems have disappeared: systems, just as other products, exhibit considerable inertia once developed (cf. Norman 1988).

generalisations concerning both the typographical features relied upon by distinct classes of document and the constraints exercised by varying tech-nological processes for document production.

Twyman’s contribution to this framework focused mainly on narrow typo-graphical features (i.e., properties of characters including typeface, spacing, colour and so on) and a classification of document types according to their combination of text, images and graphics; we will see more of the discus-sion of typographical features in Chapter 3 and of the discusdiscus-sion of docu-ment types in Chapter 5 on genre. Walker’s contribution takes this further, drawing on Crystal and Davy’s (1969) characterisation of linguistic and par-alinguistic levels so as to construct a ‘checklist’ of document properties to be used for comparisons. Visual graphic language is described here (Walker 1982, p104) according to a document’s physical make-up (e.g., typewriter, print, the kind of paper used and so on: i.e., in our terms, the production and canvas constraints), its spatial articulation (e.g., the spacing between words, lines, indentation, margins and the like) and its graphic articulation (e.g., modifications expressed by underlining, punctuation, initial capitalisation and so on).

A further refinement of the approach is pursued by Norrish (1987) in order to provide a more fine-grained analysis of the larger scale structure of collections of documents. For this, Norrish sets out a hierarchical view of documents in which structure is captured in terms of nested sequences of typographical categories. These categories start with an entire docu-ment, such as a journal or a book, and progressively break the document down into subcomponents such as paragraphs, lists, notes and so on. Each of these components receives a particular description concerning the typo-graphic features employed. These include the kinds of checklist features proposed by Walker and, in addition, places their occurrence within a struc-tural representation of the document as a whole. Problematic with the ap-proach, however, is its combination of several kinds of information which, logically, do not belong together—in particular, the physical distribution of information across pages and the functional organisation of that informa-tion. Conflating these is then less than ideal because generalisations are easily obscured. The approach also fails to describe the precise spatial ar-ticulation of pages in any generic way. The model as a whole is therefore still more focused on textual presentation with relatively limited coverage of the spatial possibilities of pages.

Despite these drawbacks, the approaches developed by Twyman, Walker and Norrish nevertheless represent a landmark in multimodal document analysis. Much of the development that we pursue in this book can be seen as following the same path that they define and for many of the same

reasons. Their commitment to an explicit analytic method for setting out the particular structures and realisational forms employed in multimodal documents is still rarely found in other areas of design.

Schriver

The most detailed introduction and overview of the field of document de-sign as a whole is still probably that given by Schriver (1997) and we will return to positions that she sets out in many places in this book. For our im-mediate aims in this section, however, we pull out of Schriver’s description aspects particularly concerned with how designers see the process of doc-ument interpretation—although, as we shall see below, Schriver’s account also leads directly into issues of production, too.

Schriver follows a line of argument in document and information design that sets out as a basic premise that designers need a good understanding of how readers interpret the documents that those designers have produced.

The main goal is to produce user-oriented documents which support the activities and requirements of the documents’ intended consumers. This builds on modern research into reading and comprehension, extending it to the case of the multimodal written document. From reading comprehen-sion, for example, it is well known that prior knowledge (cultural, specific, genre-related, etc.) plays crucial roles in forming interpretations of received material (cf. Spiro 1980, Wright 1980, Wilson and Anderson 1986, Winn 1989, 1991, Hegarty, Carpenter and Just 1991, Ollerenshaw, Aidman and Kidd 1997, and many others). It is therefore problematic to produce docu-ments as if their readers were passive receivers of some straightforwardly coded message.

This variability in potential interpretation on the part of readers presents serious problems for appropriate design: it is not straightforward for de-signers to predict just how their designs will be received. In general, a doc-ument is improved when its design employs information structuring that corresponds to, or supports, the knowledge its users can reasonably be as-sumed to have and when the reader’s short-term memory for interpreting language and layout in combination, for example in tables, is not stretched too far (cf. Wright 1970, Wright and Barnard 1978 and Schriver 1997, pp274–275). All of these issues can be incorporated to motivate better de-sign decisions and so need to be brought into the dede-sign process: waiting until the document has been produced before testing for usability is often too late (cf. MacDonald-Ross and Waller 2000).

Explicit awareness of the fine-grained functional consequences of design decisions therefore plays a central role in current professional design. To

support this, Schriver sets out several ways in which particular selections of design features—layout, spacing, fonts and so on—can be used to increase the rhetorical and communicative ‘impact’ of a document on its interpreters.

These design features are carried by text elements (Schriver 1997, p342), which are Schriver’s way of defining the parts that go to make up pages.

Schriver suggests two kinds of text elements that we distinguish here as ‘simplex’ and ‘complex’. First, there are simplex elements that “often depend on their genre (e.g., reports have executive summaries, online help has procedures . . . )” and which serve as individual, isolatable elements of page design (Schriver 1997, p343). Second, there are configurations of such elements on the page which Schriver terms rhetorical clusters. These are defined as follows:

“By rhetorical cluster I mean a group of text elements designed to work to-gether as a functional unit within a document. Rhetorical clusters act as reader-oriented modules of purposeful and related content. They are comprised of visual and/or verbal elements that need to be grouped (or put in proximal re-lation) because together they help the reader interpret the content in a certain

way.” (Schriver 1997, p343)

Identifying such rhetorical clusters is then one of the basic steps proposed by Schriver for ‘seeing the text’, for describing and analysing what goes into a page design.

Schriver suggests several particular examples of rhetorical clusters, such as ‘illustrations with annotations and explanations’, ‘body text with foot-notes’ and ‘procedural instructions with visual examples’; we will see other examples below. As can be gathered from these, a rhetorical cluster can be quite extensive in scope and will often include smaller-scale clusters within it. Schriver suggests that an explicit consideration of this kind of organi-sation can contribute substantially to turning design and design interpreta-tion into a more analytic enterprise—and this, in turn, should render design more ‘teachable’.

Applying this to our Gannet example, the approach appears quite natural—up to a certain point. For example, we have a body text with particular paragraph styling, some headings and subheadings and an itemized list (cf. Schriver 1997, p343). This body text is itself perhaps part of an ‘illustration with annotations’ cluster—or is the illustration within the body text cluster? In any case, we can see that as readers it appears straightforward to interpret the page by deploying such standardised interpretive frames:

“We can think of a document as a field of interacting rhetorical clusters. If the document is well designed, the clusters orchestrate a web of converging

meanings, which enable readers to form a coherent and consistent idea of the content.” (Schriver 1997, p344; original all in italics)

More or less intuitive analyses of this kind can generally be produced on demand by a document’s readers. This therefore provides us with a good starting point for interpreting multimodal written documents but needs to be taken further in at least two respects.

First, recognising a rhetorical cluster presupposes that we have inter-preted the contributing elements to stand together in some particular func-tional relationship. We need to be able to open up this account so that we can explore possible clusters on the basis of a document’s layout without having already recognised their function. If we can only find clusters when we know the function of the cluster, the account is in some respects circular.

Although this is less of a problem for interpretative analysis, we want to reach a position where we can lay bare the process of interpretation itself.

This requires being able to specify identification criteria for elements so as to provide a reproducible scheme for exploring interpretations and for asking which interpretations a document supports and which it does not.

Second, there are many fine details of design which are not readily assim-ilable to a rhetorical cluster. In the Gannet page, for example, what exactly is happening in the itemised list in the lower portion of the page? That it is an itemised list is clear, but its precise form of expression is unusual by today’s standards. The entry for the Gannet’s eggs is particularly strange;

we repeat the approximate layout here:

Eggs I, nearly white, chalky. April or May.

This appears almost to be working towards a single line table in which we have three ‘columns’: the object considered, the appearance of the object, and the time of occurrence of the object. We see something similar to this in the second line of the page also. In both cases, simply stating that we have an itemized list, or a table, or something else, would be to miss the details of the particular case; this appears to be a tentative spatial transition towards the very explicit tables that we will see in some further example bird field guides that we analyse in Chapter 5. Since we want to explore how meanings are being made (or not) multimodally and how these prac-tices develop over time, it is important to capture this and all other relevant levels of detail so that we can consider their contribution to establishing and changing applicable conventions of interpretation.

Reading ‘into’ graphical materials

We must also address here an issue that we will raise at several points in the current discussion. When considering the possible parts of a multimodal document, we have to take an explicit position with respect to the extent, if at all, that we will be prepared to ‘read into’ the pictures and graphical materials on the pages we analyse. In the case of the Gannet, there are naturally resonances between the information portrayed in the drawing and the text—this is also particularly relevant for the genre at hand because the drawing is intended to capture some of the essential features of the bird.

With the more informal approaches that characterise design, this is again usually done ‘on demand’, particularly when there is something ‘striking’

about the visual information; here accounts examining the use of pictorial rhetoric, visual metaphors, etc. provide many illuminating examples (cf., e.g., Forceville 1996).

For our starting point here, however, we will be more concerned to re-strict this possibility. This is in order to rule out at the initial segmentation stage of analysis the considerable flexibility in recognised ‘parts’ that would otherwise ensue. This is not to say that it is impossible, or not valuable, to segment visual material in the ways required to describe how the detailed content of an image can interact with other components of a document and its interpretation and, indeed, we will return to issues involved here when we discuss more of the relationship between text and images in Chapter 4.

For our starting point here, however, we will be more concerned to re-strict this possibility. This is in order to rule out at the initial segmentation stage of analysis the considerable flexibility in recognised ‘parts’ that would otherwise ensue. This is not to say that it is impossible, or not valuable, to segment visual material in the ways required to describe how the detailed content of an image can interact with other components of a document and its interpretation and, indeed, we will return to issues involved here when we discuss more of the relationship between text and images in Chapter 4.