• No results found

Multimodal linguistics

Multimodal Documents and their Components

2.2 The page as an object of interpretation

2.2.2 Multimodal linguistics

Within multimodal linguistics, there have been several attempts to find more analytical methods for investigating multimodal artefact interpreta-tion. As we shall see, it has been relatively natural for linguistic accounts that focus on linguistic function rather than on form to broaden their atten-tion to artefacts other than verbal ‘texts’, construed narrowly.

A prime example of work of this kind is that of the social semiotic tradi-tion developed by Halliday and colleagues, called systemic-functradi-tional lin-guistics (commonly abbreviated as SFL: Halliday 1978). Work within this direction of research makes the following, general assumption.

“A multimodal and social semiotic approach starts from the position that visual communication, gesture, and action have evolved through their social usage into articulated or partially articulated semiotic systems in the same way that language has.” (Kress, Jewitt, Ogborn and Tsatsarelis 2000, p44) And, building on this, ground-breaking work by Kress and van Leeuwen (1996) on a linguistics of the visual image and by O’Toole (1994) on a linguistics of art has certainly revitalised earlier semiotic and philosophi-cal attempts to achieve a broader understanding of the relations between meaning in language and other semiotic modes (cf. Barthes 1977a). Many researchers working within the SFL tradition now include a variety of mul-timodal artefacts—mulmul-timodal documents among them—in their domains of interest (for overviews see, e.g., O’Halloran 2004c, Kaltenbacher 2004, Martinec 2005, Royce and Bowcher 2007).

Systemic-functional approaches investigate how texts in general are ar-ticulated to show their appropriateness for particular situations of use, or contexts. Moreover, ‘text’ as such is construed as an essentially semantic unit, rather than one defined by its external appearance or surface realisa-tion. It is this starting point that has made it natural to consider extensions of the framework to apply to semiotic artefacts more broadly.

SFL captures the appropriateness of texts to context in terms of three in-teracting but distinct functional domains: the representational domain, the interactional domain and the ‘text-organisational’ domain. These domains are referred to as metafunctions because they are essentially ‘generalised’

functions that hold whenever a linguistic unit is constructed: any linguis-tic unit, such as a clause, can be analysed simultaneously according to the work it does to represent the world, the work it does to enact social relation-ships (e.g., asking questions, evaluating entities, etc.), and the work it does to contribute to an unfolding text or dialogue. The presence of this work is not dependent on what particular meaning is being expressed.

Multimodal SFL analysis sees visual presentations as subject to the same generic functional requirements as other communicative artefacts. Such artefacts are accordingly already presumed to manage meaning-making in the three metafunctional domains. A photograph, for example, may present simultaneously a representation of something occurring, an interpersonal appeal to the viewer (as when a character in the photograph looks directly

‘out of’ the picture at the viewer), and a textual organisation whereby some things are made more salient in the composition (by visual prominence, position, selection of subject-matter, etc.) and others less so. Combining analyses from distinct functional perspectives in this way is a technique well developed in linguistic accounts and it is likely that we will see in-creasingly revealing analyses drawing on this particular aspect of meaning

‘multiplication’ for multimodal artefacts also.

Within this tradition, the principle approaches that have been applied to multimodal documents so far are, in addition to Kress and van Leeuwen (1996), those of Baldry and Thibault (2006), Royce (2007), and Lemke (1998); we will see more of these below. In addition, a related but rather different course has been taken by O’Halloran in a series of detailed studies of intersemiosis between the visual modes of verbal language and mathe-matical formulae (O’Halloran 1999a,b, 2004a,b). Since this has not been a prominent feature of the multimodal documents that we have analysed so far in our own studies, we will simply set a place marker: whenever the account proposed here moves to consider documents of the kind that O’Halloran has analysed, we will need to extend the descriptive layers present to include the areas of meaning O’Halloran has identified.

There are also several explorations into the multimodality of documents pursued within text linguistics, since texts more obviously raise issues be-yond the linguistic mode construed narrowly. Although this has generally been seen as something very much on the edge of text linguistics ‘proper’, a long-standing concern with classifying text types has led to several at-tempts to extend this practice for multimodal artefacts (cf. Spillner 1980,

Kitis 1997, St¨ockl 1998, 2004a, Blum and Bucher 1998, Straßner 1999, Fix and Weillmann 2000, Eckkrammer 2004). We do not provide an overview of this work here because its utility for our purposes in this chapter is un-fortunately limited: these linguistically-informed accounts do not generally problematise the actual identification of the page elements upon which their analyses are built. We do see, however, the general model that we develop as one way of bringing these diverse investigations together in the future.

Towards a ‘Grammar of Visual Design’

In multimodal linguistics at large, the most important and influential po-sition is without doubt that articulated in Kress and van Leeuwen (1996).

In this work, the main concern was to set out a systematic map of the ter-ritory for multimodal visual-based communicative artefacts, to provide a

‘grammar’ of the possibilities of meaning-making available that applies to all forms of visual presentation. The starting point for their approach is again the fundamental assumption of SFL that communicative artefacts, including those that are the results of visual design, can be characterised along the dimensions defined by the three SFL metafunctions. Kress and van Leeuwen then provide within each metafunctional domain an explicit

‘grammar’ for the kinds of meanings found in visual artefacts. Such visual design grammars take on a particular task: they need to set out detailed

‘systems of choice’ that show the abstract range of meanings that can be selected from in their metafunction. This is itself a significant claim and raises an entire range of challenging issues concerning the organisation of meanings in different semiotic modes.

For the ideational metafunction, for example, Kress and van Leeuwen propose a decomposition of visual messages into elements analogous to the division of linguistic clauses into processes, participants in those processes, and the circumstances of occurrence of those processes. Then they suggest a classification of such visual configurations that is similar, but in some interesting respects different, to the classifications found in functional lin-guistic interpretations of grammatical configurations. In particular, they de-fine narrational visuals and conceptual visuals—the former visually depict some action (Kress and van Leeuwen 1996, pp73–75) and the latter some kind of classification or analysis (Kress and van Leeuwen 1996, pp88–89, p107). This way of decomposing visuals is also adopted by, for example, Baldry and Thibault (2006, p122) in terms of visual transitivity frames and Royce (2007, p73) in terms of combinations of visual message elements (VME’s) that we will say more about below.

For the interpersonal metafunction, Kress and van Leeuwen set out possibilities for evaluating the visual material depicted in terms of inter-relationships between that material and the viewer: visual properties such as the direction of gaze of any people depicted in the visual (e.g., looking at the viewer, looking away) or the relative angle, or tilt, of the camera (e.g., looking up at, down on) are suggested to construct particular interpersonal relationships involving relative power and engagement or appeal. Many of these suggested meanings have been the explicit focus of theory and experimentation: for example, the relation between camera angle and power is explored in Tiemens (1970), Mandell and Shaw (1973) and others, the effect of ‘point-of-view’ depictions of ‘direct gaze’ has been related to visual perception and its natural focus on where potential interactants are looking by Solso (1994, pp136–137), and the effects of apparent viewing distance on interpretations of social distance are examined in Meyrowitz (1986). A useful overview of these results is given by Messaris (1997, pp34–52), who also demonstrates that there is still considerable need for more detailed investigations of precisely how these dimensions of variation are to be interpreted in context; viewers’

assumptions or knowledge about depicted scenes are already known to influence significantly the extent to which particular visual choices are available for taking on particular interpretations.

The meanings that we have sketched here from both the ideational and the interpersonal resources generally involve components within images; this raises again the possibility, perhaps even the necessity, that we mentioned above of extending segmentation of multimodal documents to include ‘in-ternal’ parts of the graphic material. Ideationally, therefore, the drawing in our Gannet example is clearly a ‘conceptual’ visual used primarily as an opportunity to present visually some of the physical properties of the Gan-net’s appearance and habitat rather than to show action. The elements of the drawing are then the bird, its identifiable parts, and the circumstances of its sitting on a cliff top. Interpersonally, Kress and van Leeuwen’s account would pick out the facts that the Gannet is not ‘looking’ out of the picture (i.e., there is no ‘appeal’ to the viewer, as might be seen in pictures of seals in campaigns against seal hunting), that we are dealing with a colour draw-ing rather than a photograph (with associated claims of reality/non-reality), and that the picture portrays the bird slightly below the viewer.

Although it may at some stage, and for some purposes, become useful and beneficial to regularly decompose images in this way, for our current more restricted goal of determining a baseline segmentation for multimodal documents, we will follow the line suggested above and only recognise any such putative parts if they are picked up cross-modally—e.g., if they are also

mentioned in an accompanying text or in explicit labels in the document and so on. For the Gannet page we have the cross-modal resonances given by the conceptual visual of the drawing, classifying the bird and depicting attributes of its plumage and its location just as does the text (“the coast and sea” and “rocky isles and stacks”). This gives us a systematic and operationalisable way of deciding whether a part or part-quality is to be included in the first round of segmentation or not.

For the analysis of pages that we are pursuing in this chapter, however, it is the resources that Kress and van Leeuwen develop for the textual meta-function (Halliday and Matthiessen 2004, Chapters 3 and 9) that are the most significant. The meaning-making resources here drive principles of organisation that are intended to be relevant whenever analysing pages for their composition and so present tools for analysing the combination of material within a single semiotic product—such as a page. Kress and van Leeuwen propose three main areas of meaning-making potential within the textual metafunction: information value, salience and framing.

Both salience and framing are seen as scales running from high to low.

High salience indicates that an element draws attention to itself in some way, usually by deploying one or more of the perceptual features that we will discuss more closely in Section 2.3 below. A high degree of framing corresponds to maximal disconnection between an element described and the other visual elements in the visual artefact; a low degree of framing corresponds to maximal connection between the element described and its surrounding elements. Standard framing devices include lines separating regions on a page, empty space, discontinuous areas of colour, particular recognisable shapes that create boundaries, and so on. In our Gannet exam-ple, it is primarily whitespace and the differing texture of the text and the drawing that provides (generally rather weak) frames.

The question of precisely what ‘entity’ is carrying these mean-ing-making alternatives—a composition, a part of a composition, a visually-recognisable element, etc.—is quite important for securing a robust analysis. Kress and van Leeuwen’s description suggests that the categories of salience and framing may apply to entire compositions and to individual elements in a composition; for example, a single element may be strongly disconnected from its surrounding elements or, alternatively, we could talk of an entire composition as being strongly framed because all of its elements are disconnected from their neighbours. With salience it is not quite so clear what its application to the composition as a whole would mean, although its application for individual elements appears natural enough.

The remaining area of meaning potential, information value, is consider-ably more complex and, despite broad acceptance among those working in this particular multimodal tradition, is also more problematic. The possibil-ities for a composition to attribute information value are first classified into two alternatives: centred compositions and polarized compositions. Cen-tred compositions are proposed to be recognisable by virtue of some dis-tinguished element appearing in the centre; conversely, polarized composi-tions are recognisable by virtue of no element appearing in the centre (Kress and van Leeuwen 1996, p224). Applying this classification reliably also re-quires issues of salience to be considered: a centred composition is likely to exhibit an overall perceptual balance for example (cf. Arnheim 1982).

Kress and van Leeuwen distinguish two cases of centering: circular, where non-central elements are distributed spatially around the centre, and triptych, where non-central elements lie predominantly on a vertical or hor-izontal axis passing through the centre. Both of these are further classified according to whether the involved composition is a margin composition or a mediator composition. In a margin composition, the non-central elements are similar so that an impression of symmetry is created; in a mediator composition, the central element is intended to provide a bridge or, in some sense, to ‘mediate’ between dissimilar non-central elements.

The role of the mediator composition in the account as a whole is some-what uncertain in Kress and van Leeuwen’s description because it shows re-lations both with centred compositions and with the polarized compositions that we discuss next. The defining difference appears to lie in the symmetry and similarity of the non-central elements. When non-central elements are not identical or ‘near-identical’, as Kress and van Leeuwen write, then they can be mediated between in both polarized and centred compositions.

The main subclassifications of polarized compositions involve two simul-taneous possibilities. The first, taken directly from linguistic accounts of the textual metafunction, Kress and van Leeuwen denote by given-new; the second, they call ideal-real. Both distinctions impose a differential inter-pretation on non-central elements, regardless of whether or not a central element is also in the composition. The given-real distinction applies to the horizontal axis; the ideal-real distinction to the vertical axis. Kress and van Leeuwen claim that at least one of these distinctions has to apply (Kress and van Leeuwen 1996, p224). The terms ‘given’, ‘new’, ‘ideal’, ‘real’ there-fore label particular spatial regions of a visual artefact that can be identified under certain compositional conditions.

Putting all of the combinations together results in the kind of ‘visual grammar’ required, analogously to that commonly developed for linguis-tic units. Just as a grammar of the possible combinations determines which

particular configurations of structural elements may combine in, for exam-ple, a clause, Kress and van Leeuwen’s description defines possible decom-positions of a visual artefact. The combined definitions in fact define just 8 possible composition ‘structures’—9 if we add the distinction of vertical vs. horizontal centre-margin triptychs, which is entailed but not explicitly enumerated in Kress and van Leeuwen’s grammar. These structures are set out in Figure 2.3.

Which of the possible compositional layouts should apply to a specific page is determined by the constraints that distinguish margins from other elements i.e., margins should be ‘similar’. Non-similar elements are then not margins and must be labelled either with ‘given’/‘new’, if they lie along the horizontal axis, or with ‘ideal’/‘real’, if they lie along the vertical axis.

Summarising, we can describe the possibilities in terms of (i) the presence or absence or a polarization along the vertical or horizontal axis or both, (ii) the presence or absence of a central element, and (iii) the similarity or difference of the non-central elements, giving rise to a center-margin composition if they are similar and given/new, ideal/real otherwise.

As mentioned above, this approach has been very influential and there are many analyses being produced using it. There are, however, two significant sources of difficulties. Kress and van Leeuwen are concerned to establish a link between compositional choices and ideological import; in Kress and van Leeuwen (1998), for example, they interpret newspaper front pages in terms of the items that are placed in the ‘ideal’ position and those that are placed in the ‘given’ position of the front page directly in terms of state-ments about ideological values thereby attributed to the material so pre-sented. This is a natural continuation of the approach found within social semiotic views of language in general: first, it is assumed that there is a fundamental connection between the forms of language (and when gener-alised multimodally, to all semiosis) and the contexts of use of those forms;

second, since contexts of use are structured ideologically according to their cultures and subcultures of use, we have a link between, on the one hand, the forms that are used and their patterning and, on the other hand, ideolog-ical configurations.

This line of research is taken the furthest in Critical Discourse Analy-sis (CDA: Fowler, Hodge, Kress and Trew 1979) and it is against the back-ground of their long-standing involvement with this tradition that Kress and van Leeuwen’s (1996) connection of compositional zones with ideological significance should be seen. Within CDA and related approaches, there are now many results throwing considerable light on the mechanisms of ideology construction and maintenance via the use of language (cf. Fair-clough 1989, van Leeuwen 1993, Martin 2000, Martin and Wodak 2003,

Figure 2.3 Possible layout compositions according to the information value portion of Kress and van Leeuwen’s ‘visual grammar’ (Kress and van Leeuwen 1996, pp223–224)

Bloor and Bloor 2007). And, as with other areas, this is now being taken in the direction of multimodal analyses, bringing in considerations of the ideological import revealed by detailed analysis of the use of multimodal resources as expressions and enactments of ideology (e.g., van Leeuwen and Kress 1995, M¨uller 1997, Knieper and M¨uller 2001, Lassen, Strunck and Vestergaard 2006).

The connection between compositional zones and ideological import drawn by Kress and van Leeuwen is, however, unusually direct. It is unlikely that such an unmediated link between a spatial region and regular ideological interpretation can be constructed—and, indeed, if we apply Kress and van Leeuwen’s categories directly to the layouts used as illustrations in this book, it is often difficult to locate generic meanings where attributions such as ‘ideal’ or ‘real’ provide significant insight. This notwithstanding, the assumption that these categories can be transparently

The connection between compositional zones and ideological import drawn by Kress and van Leeuwen is, however, unusually direct. It is unlikely that such an unmediated link between a spatial region and regular ideological interpretation can be constructed—and, indeed, if we apply Kress and van Leeuwen’s categories directly to the layouts used as illustrations in this book, it is often difficult to locate generic meanings where attributions such as ‘ideal’ or ‘real’ provide significant insight. This notwithstanding, the assumption that these categories can be transparently