Producing a page from intentions: automatic document generationgeneration

Multimodal Documents and their Components

2.5 The Page as object of production

2.5.3 Producing a page from intentions: automatic document generationgeneration

The work reported in the previous subsection has its origins in a combi-nation of concerns from the publishing industry and the emergence of the

World-Wide Web as a major distribution medium for all kinds of informa-tion. In this subsection, we see work on document description that has developed in the context of a completely different set of concerns: Natu-ral Language Generation (NLG), an area of computational linguistics that explores how we can construct computational systems that automatically produce natural language texts on the basis of non-linguistic source data, communicative goals, and communicative context (cf. Bateman 1998, van-der Linden 2000, Reiter and Dale 2000, Bateman and Zock 2003). This is also relevant for our current purposes because there has long been an aware-ness in NLG that producing text alone, i.e., monomodal texts, is of rather limited utility. The kinds of documents that most beneﬁt from automatic generation—i.e., instructional texts, weather reports, user-tailored health reports and many others—all typically combine pictures, text and graphics in precisely the ways that are typical for multimodal documents.

NLG differs markedly from the simple automatic presentation of text in, for example, search results in a web browser, or text messages in automatic bank machines, etc. in that its fundamental research goal is to fully cap-ture the flexibility of natural language production. Texts are therefore to be produced on demand that are appropriate both for their particular in-tended readers and for their contexts of use: texts are different if they are designed for the expert or for the novice, if they are to explain or to inform, if they are to be understood in a hurry or at leisure, and so on. The flex-ibility that this requires means that NLG has to concern itself with issues that arise substantially earlier in the production processes than is required when rendering, since there is no pre-existing text to be rendered: the NLG system must itself create that text. As a consequence, the problems to be solved in automatic natural language generation overlap significantly with those faced by the human speaker/writer when producing texts. Building an NLG system is then another way of constructing theoretical models of how real speakers/writers produce the language that they do.

The descriptions of documents explored in NLG then go beyond those re-quired for page rendering in several respects. Whereas rendering need only concern itself with levels of abstraction already very close to the page to be produced, NLG begins before we have made many of the decisions to which rendering could apply. Typical decisions to be made during multimodal NLG concern the modality that is to be selected in order to express some information—e.g., graphic or textual (cf. Arens and Hovy 1990, Andr´e and Rist 1993)—and the ways in which the ‘argumentative’, or rhetorical, re-lationships that will fulﬁll the communicative goal being addressed can be expressed. The major theoretical tool for capturing this latter collection of issues is that of computational Rhetorical Structure Theory. We discuss

this and its precise application to multimodal documents in Chapter 4 be-low. Here it also becomes relevant precisely because this kind of descrip-tion has sometimes been used in order to decompose the parts of pages to be generated—i.e., to define document parts. Although, as we shall quickly see, this turns out not to be an appropriate approach, it is still nevertheless useful to know precisely why this is the case since it helps us pull apart the functional and layout perspectives that we have seen conflated above in definitions of rhetorical clusters.

Rhetoric and layout

As one of the simplest possible examples of a rhetorical relationship, con-sider aBACKGROUNDrelation. This expresses the fact that some subsidiary piece of information is being provided so that the reader or hearer has a more in depth understanding of some main piece of information. This could be expressed, in a monomodal text, by some kind of conjunctive phrase such as ‘This is because. . . ’ or similar. However, it can also be expressed in a layout decision to place two pieces of information in close proximity to one another on the page, perhaps with one larger and more prominent than the other in order to also communicate which is the main piece of information being communicated and which is the subsidiary in-formation. This exhibits the trade-off between visually-informative and visually-uninformative information presentation that we saw characterised in detail by Bernhardt (1985, p29) according to genre (cf. Section 1.1.3, Figure 1.3).

Further rhetorical relations are ELABORATION, where a topic is further developed, andJOINT, which groups together several distinct pieces of in-formation as contributions to a single topic:ELABORATIONmight hold, for example, between the main body of text on our Gannet page and the list of additional facts presented in the lower part of the page (cf. Figure 2.2), whereasJOINTwould hold between those individual additional facts. Here again, we see a difference in page composition, i.e., in the elements of the page, that might also be described by the rhetorical distinctions holding between parts of the page.

Early systems for ‘generating’ multimodal pages accordingly took their account of rhetorical structure, which was already present and being relied upon to produce the natural language texts anyway, and used this to produce layout structure also (cf. the discussion in Section 4.3.1 in Chapter 4). The two were assumed to be isomorphic, or perfectly aligned. However, more recently it has become clear that we still need an account of just what kind

of layout and layout elements are going to be found in any generated page independently of any rhetorical structure that is assumed.

Power et al.’s ‘document structure’

One well-developed account in this area that proposes a further layer of or-ganisational structure to mediate between abstract rhetorical organisation and the kind of units visible in layout is that of Power, Scott and col-leagues (Power et al. 2003, Power 2000). ICONOCLASTPower et al.’s ac-count includes, on the one hand, a structured representation that captures the rhetorical import of a communicative artefact in terms of rhetorical re-lationships of the kind just introduced and, on the other, a differently struc-tured representation that captures the elements of that artefact as far as its layout is concerned. The principal application area from which Power et al. draw their examples is that of ‘patient information leaﬂets’. Patient information leaﬂets provide information concerning medicine, usually ac-companying the medicine when sold, and are increasingly subject to legal and other constraints that determine just what information is to be presented and in what form (Bouayad-Agha, Scott and Power 2000).

An example of the concerns Power et al. raise can be seen in the contrast shown in Figure 2.20. Here we see well the need for an additional layer of structure that is closer to the layout decisions to be made than the rhetorical structure. The text extract on the left of the ﬁgure is the text in its original published form; that on the left is how it would need to appear if presented as running text—both extracts are shown within boxes to frame them more clearly. Power et al. present many detailed examples of this kind that show an important dependence between the layout selected and the wording. Par-ticular attention is drawn to changes in punctuation, use of capitalisation, and direct grammatical integration of quite different elements into the un-folding text. In this case, for example, we see the list of reasons presented as a simple continuation of ‘since’—this would not be possible in the running text version which requires instead the forward-looking ‘the following’ in order to set up sufﬁcient textual structure to house the reasons that follow.

Crucially for Power et al., it would not be possible to change the layout substantially without also have to change the linguistic form of the texts.

These examples show again and again a trade-off between a verbal expres-sion of rhetorical relationships (“since”, “however”, “so”, etc.) and layout in the form of itemised lists, paragraphing, and so on. This demonstrates that decisions concerning the segmentation of the layout structure must have been made prior to ﬁnalising the purely verbal content, thus going against the more traditional view within NLG where issues of formatting

In rare cases the treatment can be prolonged for another week; however, this is risky since

• The side-effects are likely to get worse. Some patients have reported severe headache and nausea.

• Permanent damage to the liver might result.

In rare cases the treatment can be prolonged for another week; however, this is risky for the following reasons.

First, the side-effects are likely to get worse; some patients have reported se-vere headache and nausea. Second, per-manent damage of the liver might re-sult.

Figure 2.20 A formatted extract from a patient information leaﬂet adapted from Power et al. (2003, p226) together with a more simply formatted variant

and layout were seen as a relatively simply final step in producing a docu-ment: a kind of ‘pretty printing’. Explaining and simulating the additional flexibility that is evidently required in document production is then the main task that Power et al.’s new layer of document structure takes on. Final lay-out and formatting decisions are made on the basis of document structure, not of rhetorical structure. The process of producing the layout structure is then where flexibility can appear. Essentially, a rhetorical structure is

‘translated’ step-by-step into a layout structure while allowing certain devi-ations between the two to occur so that the latter is not necessarily an exact copy of the former.

Power et al.’s definition of document structure draws centrally on the lin-guistic treatment of punctuation proposed by Nunberg (1990). Nunberg’s approach is to define a phrase structure grammar for punctuation precisely analogously to that used in syntax. This grammar defines possibilities for combining smaller ‘punctuation units’ to give larger ‘punctuation units’—

for example, text-clauses may be combined in order to yield text-sentences.

‘Text-sentences’ are demarcated by initial capitalisation and a ﬁnal full-stop, while ‘text-clauses’ are demarcated by semicolons, commas, etc. Special rules take care of cases where multiple punctuation marks are entailed log-ically but fail to appear in the surface text: for example, combining two

‘text-clauses’ ending with semicolons might produce a ‘text-sentence’ end-ing with a full-stop; special rules then make sure that the logically pre-dicted sequence “;.” is replaced by a single full-stop, i.e., the redundant semi-colon disappears.

For layout Power et al. take this further by building on a tradition in lin-guistics which sets out units of a graphology in precise analogy to ‘phonol-ogy’: that is, units of smaller scale are progressively combined into larger units from letters and words right up to paragraphs and pages. Crystal (1979), for example, sets out no fewer than fourteen such units arranged

hierarchically with respect to each other and ‘horizontally’ across to units of other linguistic levels, such as phonology, grammar and semantics. The manner in which these units interrelate with each other is, however, left somewhat under-differentiated. In Power et al.’s account, Nunberg’s formal approach is extended to the following 6 abstract levels; units of a higher-numbered ‘rank’ are made up of elements of a lower rank:

0 text-phrase 3 paragraph

1 text-clause 4 section

2 text-sentence 5 chapter

Hierarchical structures can then be built out of these elements, which may in turn be expressed via concrete layout elements such as itemized lists, paragraphs discriminated in particular ways, sections and subsections. The resulting structures then show a clear relationship to the views of docu-ment ‘logical’ structure that we saw in Norrish’s approach (Section 2.2.1), in the machine-readable document and automatic document analysis com-munities (Section 2.4), and to the document markup approach shown in the previous subsection.

One important difference, however, is the explicit relationship drawn to rhetorical purpose. With this structuring possibility available, creating a document proceeds in the simplest case by progressively descending through a rhetorical structure, allocating parts of that structure to ever decreasing ‘ranks’ of document units: ﬁrst chapters, then sections, then paragraphs, and so on. With reference to the two alternative versions of an extract from a patient information leaﬂet given in Figure 2.20, we can see the process as follows. A rhetorical analysis of the content to be expressed would include information stating, for example, that there is some kind of CONCESSIONbetween prolonging treatment and increased risk, some kind of EVIDENCE between the increased risk and both (indicating a

JOINT relation) the side effects and the liver damage, and some kind of

ELABORATIONbetween the side-effects getting worse and some patients reporting headaches and nausea. Producing the document structure needs to allocate this rhetorical content to a document structure unit.

The right-hand version is quite straightforward. Here a ‘text-paragraph’

is selected as starting point, which is subsequently decomposed into a sequence of ‘text-sentences’ (with initial capitalisation and ﬁnal full-stops), some of which are further decomposed into ‘text-clauses’

(with semi-colons). The grouping of the reasons under aJOINTrelation is signalled by the explicit textual conjunctions ‘First’ and ‘Second’.

The left-hand version, in contrast, selects as starting point a

‘text-sentence’—i.e., the entire content to be expressed is placed within

Figure 2.21 Document structure adapted from Power et al. (2003, p227, Figure 8)

a single document structure sentence. This requires a more complex treatment that extends beyond the simple creation of a hierarchy offered by recursive-descent. The document structure corresponding to this version according to Power et al.’s model is shown in Figure 2.21. Here, assigning ever-smaller rhetorical units to increasingly lower rank document units proceeds without problem until we reach the second TEXT-PHRASEnode in the tree, which dominates two PARAGRAPHnodes. The resulting text shows that we need to invoke PARAGRAPHnodes here because we have two bullet items, each of which contains sentences. But this should not be permitted because paragraphs are at a higher rank than text-phrases.

To cover this eventuality, the model additionally speciﬁes a logical in-dentation level (Power et al. 2003, p226). A higher-ranking unit can then be included within a lower-ranking unit only when its indentation level in-creases. This indentation level is not, as Power et al. emphasise, intended as a literal physical indentation but as a ‘logical indentation’ where some real content-based division of units is to be expressed. Just how this is re-alised physically on the page is left open—it may be visually presented in a variety of ways, all that is required by the model is that the logical structure is perceptually recoverable.

Returning to the current example, we can see that the logical indentation level of the problematic higher-ranking child nodes is increased relative to the parent TEXT-PHRASEnode: thus we have twoPARAGRAPH(1) nodes instead of simple PARAGRAPH(0) nodes: the number in parentheses indi-cates the logical indentation level. This enables the model to capture the fact that the left-hand, original patient information leaﬂet appears to have both sentences and paragraphs occurring within a single sentence. This is

managed by moving structural information out of a purely linguistic realisa-tion (e.g., ‘ﬁrst’, ‘second’), where such a realisarealisa-tion would not be possible, and into a corresponding layout realisation (e.g., an itemized list employing bullets).

This establishes a clean formal specification and also leaves sufficient in-formation for visualisation to consider when rendering the document struc-ture as layout. The lowest nodes are paragraphs, and have many of the formatting properties of paragraphs. But they are also logically ‘subordi-nate’ to their surrounding text and are displayed (in this case) physically indented with respect to the text around them. The model then captures both these sources of constraint for the layout options actually taken up and provides, on the one hand, a link between the rhetorical communicative or-ganisation of a document, via the relation to rhetorical structure and, on the other, a way of flexibly relating this to a range of fine-detailed variation in wording and layout.

In the examples seen so far, the ﬂexibility allowed between rhetorical structure and document structure has nevertheless remained relatively sim-ple. Power et al. characterise this formally as a homomorphism between the rhetorical structure and the document structure—that is, it is possible for the document structure to omit details included within the rhetorical structure but not to add structure that differs from that of the rhetorical organisation. This supports the case where a more deeply structured rhetor-ical structure might be presented in a document as a relatively ﬂat sequence of paragraphs—as typically occurs, for example, when the rhetorical rela-tions are being expressed linguistically in terms of conjuncrela-tions.

Bouayad-Agha, Power and Scott (2000) pursue this further, presenting evidence that the relationship between rhetorical structure and document structure has to be more ﬂexible still. In particular, they propose that an additional operation of extraposition has to be allowed by which complex chunks of rhetorically organised material are extracted from their position in a rhetorical structure and presented separately as their own independent document elements. This may be done for reasons of continuity, where some particular components of the rhetorical organisation are moved ‘out of the way’ of some other development in order for a more ﬂuent rendition.

Although precise motivations for such a realisation step still require fur-ther investigation, Power et al. see this divergence between rhetorical or-ganisation and document structure in many respects as a consequence of the linearity of language. The content that language delivers is rarely linear and so a variety of techniques and linguistic resources allow the order in which information is presented to transcend its one-dimensional delivery in time. Both text structures and sentence structures in probably all languages

admit of special constructions that can ‘lift’ material out of its regular posi-tion for reasons of emphasis or other text structuring requirements. This is then precisely the kind of operation that Power et al. and Bouayad-Agha, Power and Scott see operating on rhetorical structures in order to produce non-isomorphic and even non-homomorphic layout representations.

The mechanisms that Power et al. set out allow them to pursue the classic natural language generation methodology of producing variations accord-ing to a collection of constraints and evaluataccord-ing the appropriateness of the generated result. But they only take us so far towards our goal in the present chapter of achieving a suitable starting point for page description. In partic-ular, their treatment of layout as an extension of linguistic units is not yet

In document John Bateman. Multimodality and Genre (Page 111-123)