Describing a page for rendering - The Page as object of production

Multimodal Documents and their Components

2.5 The Page as object of production

2.5.2 Describing a page for rendering

When we are discussing page production, we need to bring in from the outset some notion of the technology used in that production—whereas sounds are produced by a biological entity, pages are produced artiﬁcially and require technology, even if it is relatively simple technology such as a quill on parchment, or burnt sticks making marks on a cave wall. Each of these modes of production leave their traces in what is possible for the resulting artefact. Given a particular technical ‘apparatus’, however, we can describe how a particular document would be achieved with that apparatus.

As technology has developed further, it has become possible, indeed nec-essary, to specify more about the document being produced than the pre-ferred spatial alignments and spatial regions that we see in makeup sheets and grids. It is necessary to include speciﬁcations of the appearance of content as well as that content’s placement. This has gradually led to a considerably more complex set of tools for capturing how a document is to be ‘produced’. The underlying mechanisms here rely on the notion of document description languages: these are special notations that are used both to describe the logical organisation of documents and to constrain how the documents are to appear on the page or screen.

The origin of this direction of development is to be found in publishers’

attempts to achieve more sophisticated ways of managing the documents they were producing. Publishers wanted to be able to specify the overall logical organisation of a publication—for example, its division into chap-ters, sections, subsections and paragraphs—without also committing to any particular presentational style. This would allow the ‘same’ book to be readily adapted for a variety of publishing styles: for example, producing chapter headings differently, positioning page numbers differently, dealing with endnotes rather than footnotes, producing the ﬁrst letter of the ﬁrst paragraph of each chapter in a larger form, and so on. Moreover, once the logical organisation is available, a publisher might produce quite different documents, such as one that pulls names out of a larger document to pro-duce an author index, without having to retype the content.

To support these kinds of manipulation of documents it was necessary to develop ways of storing documents in electronic form in some other way than as an image. As mentioned above in our discussion of automatic document analysis, it is clearly of more utility to have the contents of doc-uments available as sequences of characters rather than as images of pages and, moreover, to explicitly capture additional information concerning the logical structure of the document also. This led to the development of the Standard Generalised Markup Language (SGML: Goldfarb 1990, Bryan 1988), the origin of all subsequent markup languages, the language of web-pages, HTML, among them; SGML is now also an international standard for document markup (cf. International Organization for Standardization 1986).

SGML raises the distinction between presentation and logical organisa-tion of documents—which we have already seen utilised in our discussion of automatic document analysis above—to the basic architectural principle of its system of representation. This supported the re-use of valuable con-tent, which was already by itself sufficient reward to push many publishers towards SGML-based publishing. The division of information into content and layout, or between logical and presentational organisation, received a dramatic further boost with the emergence of the World-Wide Web. Since no two people might have their web-browsers set to exactly the same size, the idea of separating logical content from presentational form adopted as a basic principle for SGML was instantly applicable and in considerable demand. By writing a document-description file, a writer could produce content and not worry about how exactly a web-browser might finally get this content onto the screen. For this purpose, the full-blown publisher-oriented capabilities of SGML turned out to be too complex and the very

much reduced version known as HTML (Hypertext Markup Language) was developed instead.

Straightforward web-pages are accordingly created by providing an HTML document that is displayed, or rendered, by a web browser.

Importantly for the metaphor being constructed here, the HTML file is not itself the document that is perceived and interacted with by its readers: the HTML file is a set of instructions that are interpreted by the web browser in order to produce the final result. In this sense, the HTML file considers a document as an object-of-production: the description given is of how the document is to be produced by the supporting technical apparatus (of the web browser in this case). We can see an example of this in Figure 2.18:

on the left-hand side we have the source HTML, and on the right-hand side how this would be rendered in a (very small) web browser.

<h1>An example of an

This is a paragraph with some emphasised words in it.

And this is another paragraph.

Figure 2.18 HTML source and its rendering in a web-browser

There is here no direct relationship between the distribution of charac-ters in the source ﬁle and their layout as rendered in the formatted result.

In the source ﬁle, the angle brackets enclose so-called HTML tags that give instructions to browsers concerning how the material they contain is to be displayed. In a properly formed HTML document, opening tags (e.g.,

<em>) are matched by corresponding closing tags (e.g., </em>). The prop-erty identiﬁed by the tag holds for the extent of the document lying between the open and closing tag (which may then itself hold further tags describ-ing other features); we will see more of this kind of representation and its modern counterpart when we return to multimodal corpora in Chapter 6.

An increasing range of documents, and almost all professionally pro-duced documents, are nowadays created using some variant of this ap-proach. Rather than a ‘direct’ rendering of the document and its pages by, for example, pasting paper content into the areas suggested by a se-lected grid or makeup sheet, the writer or designer instead produces (ei-ther directly or indirectly via a software design tool) a set of instructions for where content is to appear and in what form. The intended document is therefore described as a set of elements and relationships that contains enough information for a rendering tool to produce the ﬁnal pages.

Widely-used professional tools such as Adobe InDesign¹¹ or QuarkXPress,¹² as well as simpler programs such as Microsoft Word, provide visually-based interfaces that give the designer the impression of moving content around a makeup sheet and of applying appearance styles: these operations are translated internally into sequences of instructions. This book was itself prepared in a further description language used for rendering: the freely available system L^ATEX.¹³

Simple document description languages, such as HTML, provide a very basic repertoire of instructions for describing pages. Designers of web-pages were understandably never satisfied with this. If one leaves it to chance how precisely a page is to appear, then one is also placing consid-erable trust in the web-browser. This is a resounding piece of evidence in favour of multimodal meaning. If a designer has put work into producing a well designed document for electronic presentation, he or she does not then want a web-browser to take that document and to render it in some completely different way. This unwillingness to place ultimate faith in the web-browser and its decisions led, first, to a considerable range of arbitrary extensions to HTML for controlling layout and formatting (which we will not discuss further here even though the majority of web-pages are still pro-duced in this way) and, second, to a further cycle of development of markup languages where the particular presentational aspect of the document-as-product specification was given considerable attention.

As a direct consequence of this further cycle, we now have available ex-tremely detailed models of documents that include precise speciﬁcation of their elements, those elements’ properties, and their interrelationships.¹⁴ These models are embedded in the sophisticated view of document pro-duction given by the eXtensible Markup Language, XML—the successor to HTML (and, gradually, SGML too). In this view, the division between content and presentation is taken to its logical limit. XML descriptions contain no implications whatsoever for presentational style. In order to make this logical content visible at all, one needs to specify exactly how the logical elements of an XML ﬁle are to be rendered visually building on detailed document models. The overall scheme, which we will return to

11http://www.adobe.com/products/indesign

12http://www.quark.com/products/xpress

13http://www.latex-project.org

14The main body responsible for these schemes is the World-Wide Web consortium (W3C) and a large number of largely voluntary design groups are actively pursuing proposals and speciﬁcations. Speciﬁcations that achieve a high degree of support in the community become W3C recommendations and, from there, gradually work their way into implementations of web-browsers and other tools.

Figure 2.19 Flow of information in an XML-based document preparation scheme

below in Chapter 6 as a basis for multimodal corpus design, is depicted in Figure 2.19.

Document preparation begins on the left of the figure with the creation of some information that is structured with content-related labels. This is then moved towards rendering by transforming the document automatically ac-cording to an XML Stylesheet Transformation (XSL); such transformations are also expressed as XML documents by using a special set of defined tags and structuring constructs. The results of such transformations are further XML documents whose tags and structure express specifically how partic-ular parts of the document are to be presented on the page or screen. One standard set of tags for this purpose is given by the XML formatting ob-jects standard (XSL:FO). There are several rendering engines, i.e., special programs, that can interpret the instructions given in a formatting objects document so as to produce printed pages, for example via Adobe Acrobat’s portable document format (pdf). Another standard set of tags are given by the current XML-compatible version of HTML: documents of this kind can be displayed on a computer screen, using other rendering engines such as web-browsers. For this latter case (the lower path in the figure), there is an additional possibility of specifying further constraints concerning how the elements of a document are to appear. This is achieved by means of HTML Cascaded Style Sheets (CSS), a style language developed specifically for web documents.

As a very simple example, the originating XML document might specify that some block of content is an ‘abstract’ for the article that that docu-ment represents. Note that this is ﬁrst and foremost an aspect of logical

organisation: it expresses a relationship between this content and the rest of the content in the document, it does not itself commit to any presentation choices. Let us call the tag in the original XML document that identiﬁes this content ‘abstract’, then the document would contain somewhere:

. . . <abstract>some text</abstract> . . .

Then, an XSL transformation might be deﬁned to convert an ‘abstract’, whenever this tag is used, into a particular type of formatting object, let us call this for present purposes a ‘block’ so that we do not have to consider the rather more complex objects that the XSL:FO standard actually deﬁnes;

blocks have their own set of possible properties and organisation which are concerned solely with presentation, not with content.

The XSL transformation would therefore also typically add some attributes to any block corresponding to an abstract by saying that it belongs to some particular class of blocks, e.g., a ‘salient, single column paragraph’. This formatting object document may then either be passed to a rendering engine for printed documents, where the block-element might be produced as a paragraph set-off in some way for prominence, or to a web-browser. In the latter case, we may also have an HTML style sheet that speciﬁes particularly how blocks of the identiﬁed class are to be treated. For example, the class may call for a particular typeface, a particular type size and weight, and a particular kind of ‘box’ behaviour including indentation, margins, borders and printing direction (e.g., left-to-right, etc).

Although this process may appear at ﬁrst glance somewhat complex, the beneﬁts become evident as soon as a document is to be reformatted or adapted for multiple purposes. Moreover, if there are several documents, for example, a book series or a website with many pages, then much of the style information can be reused many times; updating the look of the entire book series or website is then very much easier since all the infor-mation concerning presentation styles is grouped together independently of content. It is also much easier to maintain different views of a website in parallel: for example, versions for users with varying interests, abilities or problems (such as reduced vision, colour blindness, etc.). These powerful features taken together are rapidly making the document preparation path shown in Figure 2.19 a standard model for document preparation in general.

These mutually interacting schemes together provide extremely ﬁne-grained speciﬁcation languages for describing document presentation and layout. They include, for example, schemes for how to describe characters and their formatting, schemes for how areas of the page are to be decomposed and positioned with respect to other areas with margins

and spacing (area or box models), and schemes for how pages are to be organised as wholes (page models). We will use some of these constructs further below, although they have not all reached the same level of maturity. Character models, for example, are already quite complex and allow explicit specification of the expected properties of fonts, such as size, weight, typeface and so on, as well as properties required by internationalisation, such as reading/writing order (left-right, left-right, top-down, etc.) and more complex spatial configurations where individual characters may be constructed out of spatially distributed subcomponents, as found in several Asian writing systems. In contrast to this flexibility, the currently standardised page models have not reached a comparable level of sophistication and only include a few straightforward grid forms with fixed areas: e.g., header, body, foot and margins.

Considerable development effort is currently being expended to push these speciﬁcations towards the kind of sophistication found in the print industry, where there are substantially more sophisticated treatments avail-able. This lag is one reason why most web-pages still remain relatively simple typographically. But it is now only a matter of time before the stan-dards being jointly developed by the electronic and traditional publishing industries make their way into web-browsers also. Accounts of document parts for multimodal document analysis should not attempt to re-do the considerable quantity of excellent work that has gone into the deﬁnition of these standardised descriptions.

The most important issue for the current chapter is then simply that such models exist and that they are extremely detailed in particular ways. Our own approach will therefore be to maintain an ‘open’ point of connection with such emerging standards so that we can use as much, or as little, of the detail of their speciﬁcations as is required for analysis purposes in the areas that they cover. We will see this in more detail in the next chap-ter. But these descriptions do not cover all that one needs when attempting multimodal analysis. What needs to be done, therefore, is to bring the con-siderable detail that is available within the areas of rendering and document description together with other areas of necessary description which are not yet so detailed. This is one goal of the combination of presentation layers that we are pursuing in our GeM model.

2.5.3 Producing a page from intentions: automatic document

In document John Bateman. Multimodality and Genre (Page 105-111)