Modelling Text Transmission: from Documents to Texts, and Return

Textual scholarship (and digital textual scholarship) engages with texts transmitted across time by means of mainly physical supports.57_{These supports can be made of different}

material (paper, papyrus, parchment, stones, fabric, wood, etc.) and can be handcrafted (manuscripts, epigraphy), or mechanically produced (printed books). Each combination of the above presents its own issues, problems, and requires specialized expertise and experience to be understood in full. Can a text be separated from the physical object on which it is inscribed without loss of information? The whole idea of transmission, edition and distribution of texts from the invention of writing requires us to answer ‘yes’ to this question. For centuries, if not millennia, human civilisation has relied on the fact that text can be transmitted independently from the document in which it was originally inscribed by its ‘author’. On the other hand, it has been clear for millennia, again, that ‘without loss’ is a utopian statement and since the establishment of the philological school of Alexandria, scholars have tried to find the rules and principles to fix mistakes, record different readings, and restore the ‘original’ version of a given text – whatever meaning is given to the word ‘original’. Traditionally, we date the birth of textual scholarship with the work of the scholars at the library of Alexandria and from the edition of the Iliad by Aristarchus (200 BCE). From the very beginning, what struck those scholars was the discovery that, comparing any two manuscripts of allegedly the same work, the texts contained within these documents were different, in some cases considerably so. Since then it has been the purpose of textual

scholarship to make sense of this variation, to discover the reasons for which variation occurs and to reconcile this with the need of providing reliable and correct texts to the readers. What ‘correct’ actually means defines the different theories of textual scholarship.

Texts (intended as the verbal content of documents, i.e. the Verbal Dimensions) can be separated from the medium that originally hosted them because they ‘are not only physical objects’ (Eggert 2009), they are also immaterial, abstractions, and can be realized in many different ways, using different media, through different channels. This capability comes at a price: variation and progressive deterioration of the original verbal content. Textual scholars and philologists have studied the phenomenology of textual variation since antiquity, and why variants (and errors) happen is well understood. However, when dealing with

computational methods, more than why something happens, it is important to establish how

57_{Increasingly, however, textual scholarship deals with born-digital texts, cf. for instance, Kirschenbaum and}

does it work, and if this can be made computationally processable; in other words, how it can be modelled.

Text transmission can be generalized and modelled as a form of information

transmission, and therefore can be considered a type of communication. If we take this route, it seems then appropriate to try to apply models developed by classic theories of

communication. As we will see the experiment has proved quite successful, even if a certain number of adaptations have to be applied in order to fit the model to the use case.

3.1 Textual Transmission: a Communication Model

The capability of texts to be transmitted across time and media is at the core of a great part of human civilization and it is also the object of study of textual scholarship: editing itself is nothing if not a form a textual transmission, and critical editors are in a sense not so different to ancient scribes who were contaminating readings from more than one source, correcting what they thought was wrong in their antigraph, apart from the methodological rigour and the scholarship itself, that is. In the previous chapter we have modelled the way a text comes into being, namely by the act of an Interpretant who selected a series of Facts from one or more Documents, combining them into a Text, but how does such a Text get transmitted and what are the circumstances and the entities that can influence the way they are transmitted? We have defined Edition as the ‘public’ face of a Text, but how does it work as vehicle for distribution? Can the cognitive process that is enacted by the Interpretant when considering the Facts of the Document be formalized and described? If we consider textual transmission as an act of communication and adopt the terminology of communication theory, then we could refer to each of the Dimensions introduced in the previous chapter as ‘codes’. We could then define Text (or a model-text) further as follows: ‘a text is (also) a

multidimensional message that conveys a set of meanings transmitted by various codes which are potentially understandable to at least one group of Interpretants with the capability and interest to decipher at least one of such codes’. According to this definition, we could then define textual transmission as a selective act of communication where the Interpretants- receivers willingly or unwillingly makes the selection of which code to decipher from the number potentially available from the source that contains them.

According to the classic theory of communication elaborated by Shannon and Weaver in 1948, a communication act is represented by the transmission of a message produced by a source from a sender to a receiver via a specific channel until it reaches its destination, using a shared code. If we apply this model to textual transmission, then we

could say that the source is where the production of the message happens, the responsibility of what we have called the Producer of Document(s) and which in turn could be the author, the scribe, the publisher, the scholarly editor who crafts a new critical edition, or any other agent (or group of agents) who creates or contributes to the creation of the text. The same entities can be considered the transmitters, in the moment they communicate the texts they have created (or modelled). The receiver is the Intepretant-reader, who, in a circular motion, could be the scribe, the publisher, the editor who has to decode the message to be able to retransmit again, while the destination is where the meaning is created, namely the model- text. Finally, the channel is the medium, the support that transmits the text itself, which in our model is represented by a set of the facts derivable from the Document.

According to the same classic communication theory, there are many factors that can affect a perfect transmission of the message from the sender to the receiver; for example we can think of the imperfect quality of the transmission channel or medium, which could be affected by the so-called noise, and the non-perfect coincidence of the code of sender and receiver. If we translate this into our textual transmission case, we can consider as noise any external interference with the support (the channel) that might affect the capability of deciphering the messages. Damage caused by water, pests and fires, or rebinding could be considered a type of noise. The change of medium should also be considered here. McLuhan declared that ‘the medium is the message’ and by this he points out how our perceptions are influenced by the way the message is conveyed to us; he actually pushes the argument to the point of saying that the way in which each medium influences the content becomes part of the message if not the whole message itself, as it transforms the message in such a way that it becomes unrecognizable (McLuhan, 1964).58_{This argument, of course, applies both to}

historical re-mediations, such as the transfer of classic literature from scrolls to codex or the revolution caused by Gutenberg’s Galaxy (as outlined by McLuhan in a 1962 book which studies the sociological impact of the introduction of new technology in culture and society),

58_{Actually, the title of the book by Marshall McLuhan should more correctly be quoted as The Medium is the}

Massage. In the Commonly Asked Question of the website maintained by McLuhan’s estate at the question ‘Why is the title of the book “The medium is the massage” and not “The medium is the message”?’, the son of Marshall, Eric McLuhan answers: ‘Actually, the title was a mistake. When the book came back from the typesetter’s, it had on the cover “Massage” as it still does. The title was supposed to have read “The Medium is the Message” but the typesetter had made an error. When Marshall saw the typo he exclaimed, “Leave it alone! It’s great, and right on target!” Now there are four possible readings for the last word of the title, all of them accurate: “Message” and “Mess Age,” “Massage” and “Mass Age.”’ < http://marshallmcluhan.com/common-questions/>

but also to the work of editors accessing primary sources from the screen of their laptops, and, by extension, to the consumption of digital editions instead of printed ones.

In 1960 Roman Jakobson returned to the Shannon-Weaver model, using the various components of the communication act in his functional analysis of language, and to the set of agents devised by Shannon and Weaver he introduced a fundamental correction by adding a two new components: the code, which was only implied in the Shannon-Weaver model, and the context where the communication takes place. The passage from the signal and information theory to communication and linguistics made the revised model extremely influential in the Humanities and Social Sciences. The classical communication theory assumes that senders and receivers share the same code and they are equally capable of coding and decoding the message. However, when we apply the model to natural languages, this is demonstrated to be impossible both in theory and in practice. In his Course de

Lingusitique Générale published posthumously by his students in 1916, Ferdinand de

Saussure introduced the fundamental distinction between langue and parole, which undermines our confidence that two people can speak and understand exactly the same language, as the personal performance, the parole, necessarily implies an element of individuality and therefore a certain distance from the established, shared rule signified by the langue. Differences in languages are both personal and social, and can be influenced by external factors. Since the Sixties scholars of sociolinguistics elaborated their classification of multilingualism, stating that in any given national language elements of multilingualism could be found, and that the following stratification and variation can be singled out:

diachronic (language varies in time), diatopic (language varies in space), diastratic (language varies according to the social class of the speakers), diamesic (language varies according to the medium used for the communication), and diaphasic (language varies according to the style and rhetoric of communication).59_{All of these types of variation affect communication}

as well as textual transmission, from the work of the scribes and typesetters to the editorial work of contemporary scholarly editors. In fact, no matter how accurate the editing is, factors like level of education, distance in time and space in relation to the text being edited, the use of computers and the expectations of the scholarly community and the editorial textual theory all deeply and insidiously influence the editions themselves. No two editors will produce the same edition, in the same way that no two scribes will produce identical copies. When considering textual transmission, however, we also need to add other level of codification that goes beyond the ones of natural language. The same non-coincidence of codes affects the decoding of the script or the writing conventions, the iconographic

programme of the document and other aspects of the codicological and bibliographical codes (McGann, 1991) resulting in alteration of the message.

As for the context, we have to consider factors like where the document was produced and for which audience, the context of the creation (the authoring) of the text to begin with, as well as the context in which the reading takes place and how different they are with respect to what was envisaged by the author, and so on. For instance, serialized novels published in magazines during the Nineteenth Century assumed a different type of reading with respect to the one offered by volumes in which some of them were later collected, a fact that has a not negligible effect on the understanding of their structure and plot.

This rapid excursus into communication models, media and sociolinguistic theories demonstrates how texts are complex, data-rich “things” and their transmission can be affected by many factors. It also demonstrates that variation is not an accident of text transmission but a necessity, and that each variation can be traced to differences in the channel and its noise and on the inevitable variations of the code. The study of such variation can be a very rewarding scholarly approach indeed, as the work of Cerquiglini (1999), for instance, testifies. The goal of the editor is then to reconstruct the

communications steps, working ‘backwards’ in an attempt to understand the relations and dependencies of the extant documents (the witnesses) that constitute the so-called tradition of a text, making some sense of the variation among them. In some cases, this ‘making sense’ means combining readings from different witnesses, discarding the ones considered erroneous according to the reconstructed genealogical dependence of the extant witnesses; in other cases, it means accepting that texts survive in different versions and that whatever the intention was of the author who originally produced the text, this authorial text does not exist anymore, has not existed for some time, and therefore has not been read by anybody since.

3.2 Theories and models of transcriptions60

Text transmission happens mostly in writing: whether it is matter of historical transmission (scribal copies, re-prints), or modern transmission in the shape of scholarly editions it all begins with a transcription.

60_{An earlier version of this section has been published in Scholarly Editing (2014), 34, availaible at} the address <http://www.scholarlyediting.org/2014/essays/essay.pierazzo.html>.

In the past few years a few contributions have attempted to model transcription in highly theoretical way, trying to understand and to model the cognitive process that makes people recognize a given sign inscribed on a surface, map it to a mental meaning connected to that or similar signs, and then inscribe another sign with the same symbolic value on another surface, whether analogue or digital. These speculations may seem marginal for editorial discourse, but transcription is the first act of every editorial endeavour and, in many cases, it represents a very common digitization method, in the sense that it makes the chosen selection of the verbal content of documents digital and processable. Because of this a brief account of these debates and a discussion of their scholarly implications may be of interest.

The first contribution to this debate was produced by Claus Huitfeldt and Michael Sperberg-McQueen in 2008. The authors describe their work as an attempt to give ‘a formal account of transcription as it is performed in scholarly editing and in the creation of digital resources’ (p. 295). In particular, they are interested in the relationship between a document and the transcription of the document, and what it means ‘[w]hen we say […] that a

particular resource is a transcription of a particular work’. They state that the purpose of making a transcription T of an exemplar E is to make a ‘representation that is easier to use than E. For example, T may be easier to read, or easier to duplicate’ (p. 296); when they declares that T is ‘easier to read’ than E, they then imply that T is a model of E as

simplification is one of the defining characteristics of models. In the simplest case, in order for T to be a transcription of E, T must contain “the same sequence of letter, spaces,

punctuation marks, and other symbols”; however, as exemplars (Es) are physical objects and letters, spaces and punctuation marks are abstractions, the latter cannot be ‘contained’ in the former except in a metaphoric sense. Adopting a terminology used to define instances of natural languages, they define marks as ‘tokens’ ‘insofar they are instances of types’. Consequently, the relationship between T and E is summarized as follows by Caton (2013):

under a set of reading conditions R, marks of E can be seen as a sequence of tokens each instantiating a type. So, abstractly, a document E is a sequence of types, and if under the same set of reading conditions R a document T can be seen as a token sequence whose corresponding type sequence matches E's type sequence, then E and T are t_similar’.

In the article of 2008, tokens are intended at letter level: for instance, the word ‘sees’ is composed of four tokens and two types (‘s’ and ‘e’); however, in a successive contribution of 2010, Huitfeldt et al. postulate the existence of composite tokens at word and sentence

level. But what happens if the word ‘sees’ has been written in a particular document with a long s, ‘ſees’? In this case, there are still four tokens, but how many types? Two (‘s’ and ‘e’) or three (‘ſ’, ‘e’ and ‘s’)? In other words, how shall we treat the long ‘s’, as an allograph of ‘s’ or as a separate grapheme? The question of how to map the marks on the document page to discrete elements of the modern writing system is an open one (Driscoll 2006). Huitfeldt and Sperberg-McQueen debate over ligatures and abbreviations: are they different tokens or allographs? This is where the role of reading with its interpretational roles (and therefore the Reader as an d Interpretant, we may add) is conjured: ‘a reading of a document interprets the marks in that document as instantiating a sequence of types’ and ‘[t]he reading identifies some marks as constituting tokens’ (p. 304).

Subsequent contributions (Sperberg-McQueen et al. 2009 and Huitfeldt et al. 2010) have attempted to extend the model at word, sentence and other structural levels, making more explicit the connection between the theory of transcription and the theory of encoding. Paul Caton has responded to these contributions on various occasions. In his presentation at the Digital Humanities conference of 2010 he argued for a more precise formulation and quantification of what is left behind in transcription, what is lost in the act of transcribing. He argues for the introduction of the concept of ‘modality’ which is the recognition of the fact that sometimes tokens ‘display something that is in excess of – or deviant from – the norm’; for instance a presentational modality is to be recognized in tokens that are in italics, bold or underlined; an accidental modality in tokens that are recognized as erroneous, and a temporal

modality which involves the effect of time on the token sequences. Caton then argues for text

encoding, and in particular for TEI-based encoding, as a way to record such modality which is often lost in transcription. In his intervention of 2013 at the Digital Humanities

Conference, Caton presents his take on what he calls ‘Pure Transcriptional Markup’, making a distinction between elements that wrap objects and ones that encode a property. His pure transcriptional markup substitutes or incorporates letters-tokens into elements-token in order to prove the theoretical similarity between an Exemplar E and an encode Transcription T.

What is the value of these speculations for the (digital) editor? Modelling is certainly a defining activity of the Digital Humanities and of any analytical endeavour, particularly if

In document Digital Scholarly Editing: Theories, Models and Methods (Page 75-95)