Multimodal Documents and their Components
2.6 Combining viewpoints on document parts
We have now followed the path from document interpretation to document production through a broad variety of frameworks and models that are cur-rently being pursued. We have seen how each of these addresses particular aspects of what can be considered the ‘basic parts’ of documents. We have argued throughout the chapter that providing sound means for determining these elements of a page or document are crucial. Identifying and describ-ing them in a reproducible way is a prerequisite for all that follows: for carrying out empirical investigations of the processes of interpretation, for critique, and for comparing and contrasting different approaches to presen-tation. All of these kinds of study are only as reliable as the reliability of the notion of document part that they assume.
The approaches taken within natural language generation to the problem of producing complex multimodal pages that we saw in the previous section lead us to consider two complementary perspectives from which to view what is happening within a multimodal page.
The first sees graphic design as ‘macro-punctuation’, in the rather nar-row sense of an extension of text-based typography or formatting. This is clearly relevant for text-centred contributions to layout but is also some-times adopted as a model for page design in general. We consider this to be a vestige of earlier monomodal approaches to documents where the linear notion of ‘text-flow’ is taken as basic and all more challenging layouts are seen as deviations from this. This is a position that is also perpetuated by the current state of document rendering technology—which is well devel-oped for linear text but less so for more free spatial compositions.
Starting from language and its forms and structures in this way has al-ways been more ‘natural’ for those focusing on text. It respects a tradi-tional response to restricted multimodality coming out of linguistics, one which considers punctuation and its extension into basic formatting as par-alinguistic phenomena. A good characterisation of the page seen from this perspective is the following description of the ‘semiotic systems’ of the printed page proposed by Matthiessen (2007, pp24/5):
• Language, written (with the potential for being read aloud in spoken language)
• Visual paralanguage: font family, type face (“style”), layout (graphic design)
• Visual (pictorial) semiotic systems defining images of different kinds:
drawings, paintings, photographs, maps, graphs, charts, and so on.
This decomposition aligns with the model of layout structure of Power et al.. Visual paralanguage is often seen in terms of punctuation, especially in written codes such as that for English where punctuation is taken as an indication of intonational phrasing. Similarly, type face characterisations, such as bold or italics, can also be related to emphasis required in the text, which may in turn call for a distinct intonational prominence or effect and so is a modification of the linguistic meaning being conveyed.
The second perspective for viewing multimodal pages moves on to con-sider pages as essentially visual entities and document design as a process of visual decomposition. We argue that this direction is equally valid and, indeed, essential for capturing the meaning resources of multimodal pages.
While this is relatively clear for larger page elements of the kind seen in our discussions of automatic document recognition and visual perception above, we can also see it extending and overlapping with areas previously claimed by ‘super-punctuation’ approaches. Thus, although text-flow ele-ments organise their content into typographically signalled structures, they may also to a greater or lesser extent contribute to the visually communi-cated structure of the material presented. In this sense, just as accounts of language, its graphological form, and punctuation can consider visual el-ements to function paralinguistically, we can also consider certain textual elements and formatting options to be paravisual.
When this occurs, rather than signalling meaningful distinctions us-ing the spatial possibilities of the visual semiotic systems of simi-larity/difference, proximity/difference, and grouping/nongrouping directly, a multimodal artefact may also achieve similar meanings indirectly by co-opting the possibilities of text-based typography and formatting.
Itemised lists, for example, combine exactly the two perspectives set out here. Visually, the bullets of the list and the list as a whole’s distinctive spatial framing present elements that are perceptible regardless of character-based typography. One might not know what the rendering in an itemised list means, but it would hardly be possible not to perceive the spatial segmentation expressed. Tables also, as discussed above, are archetypal examples of this dual functionality.
Both itemised lists and tables should therefore be interpreted as contri-butions to a particular aspect of the multimodal page that we, following Waller, term access structure. Elements functioning to carry access
struc-ture provide perceptually supported points of access to a document for its readers. Even paragraphing, when effective, works similarly. The use of punctuation within text blocks can play this role, but is not require to. Vi-sual decomposition therefore reaches down within written language and contributes to the interpretation of the material deployed there in a way that is not dependent on the linguistic or ‘logical’ meanings being presented.
This is clearly a continuum, with some text-typographic elements able to take on a stronger role visually than others. Many punctuation marks, such as full-stops, commas, dashes, quotation marks, etc. are for example primarily character codes whose interpretation relies on familiarity with the code. Even though the value of some punctuation elements, such as a full-stop, may also be due to their iconic relationship to ‘relatively free-space’—
thereby leaning towards providing access structure-like visual elements—
we can nevertheless set up a general difference in the semiotic contributions of the elements being discussed: whereas a sentence unit is expressed by a typographical convention, itemized lists combine elements into distinctive spatial configurations. The visual decomposition from the starting point of the page does not rely on familiarity with the typographic/punctuation code at all but is a result of visual/spatial processing.
This independence of the two perspectives active on the page makes it clear why the use of text-typographic resources within the visual written mode does not necessarily have to carry ‘paralinguistic’ information at all.
Selecting a particular colour font for some word in a text in order to show a connection between that word and a topic is not paralinguistic in any tradi-tional sense: we simply have the visual mode carrying an additradi-tional piece of information that could only be brought inside the text with a, possibly out of place, circumlocution. One example of this kind is now extremely common: the use of colour and/or underlining to indicate hyperlinks in web-pages. This information is not part of the verbal message being ex-pressed by the text: at best it may represent a possible rhetorical link with something related to the text. Moving up to larger scale elements on the page, the relation to linguistic information becomes more indirect still.
Allowing both perspectives is then necessary for capturing a fuller range of possibilities. In general, text-flow page elements are just one kind of possible ‘part’ that can be found in a multimodal document. Within such elements, the kind of regularities reported by Power et al. apply. Other page elements may contain pictures, graphic material, tables and so on.
Any element on the page may then combine elements of these kinds. How these elements are related to one another needs then to be captured by an extended organisation of layout structure similar to the models developed
Figure 2.23 The page as a site of cooperation and integration of distinct semiotic modes
for automatic layout recognition, page rendering and visually-based gener-ation, and not by extensions of text-formatting methods.
A large part of the specific value of multimodal documents therefore comes from the capability to act as sites of integration for several rather different semiotic modes. The contributions of these modes inter-penetrate each other and allow a distinctive multiplication of the potential available.
These distinct contributions as well as their combination within the page is suggested graphically in Figure 2.23. Although there are many points of overlap, and semiotic ‘re-use’, we have seen that there are also important distinctive contributions to be done justice to.
The individual mechanisms for identifying these contributions and the document parts that carry them that we have discussed in this chapter have been designed primarily to address the questions arising within the individ-ual approaches and research communities concerned. Our main task now is to show how we can draw on all of these perspectives to achieve a well-specified, extensible characterisation of document elements sufficient for a broad range of analyses. Although our own approach is best placed within the emerging tradition of multimodal linguistics, particularly in our orienta-tion to genre (Chapter 5) and the commitment to ‘transcriporienta-tion’ from corpus linguistics also argued for multimodal documents by Baldry and Thibault (2005, 2006), we have now seen that many of the existing accounts in this tradition fail to problematise sufficiently the critical initial step of demarcat-ing the parts of a page. In the next chapter, therefore, we set out the explicit process of page description that we adopt.