• No results found

5. Twitter Bootstrap and Responsive Design

1.2 Distant Reading and Big Data

Topic modelling tools emerged in response to the problem of “big data”. Big data is an issue whereby the amount of data now available to us cannot be processed either by a human or by traditional computer processing techniques. In the humanities, one solution to the problem of big data has been the emergence of distant reading. Distant reading positions itself as an alternative investigatory and interpretative technique to the more traditional literary studies method of close reading. It takes a large corpus of literary texts as its focus and then carries out a computer-assisted analysis in order to reveal patterns and trends within that corpus. Moretti is one of the leading practitioners and theorists of distant reading. Moretti argues that such quantitative analysis is a more appropriate method for certain purposes as it can reveal the larger trends and patterns that shape literature (Graphs 4). For example, Moretti uses distant reading to study the effects of market forces on the evolution of literary form. Looking at a select few texts in order to say something about literary history is not always an appropriate method, in Moretti’s view, as it privileges a small selection of texts. He argues that the canon of novels that literary scholars usually work with is exceptionally small compared to the number of novels that have been published throughout history (Graphs 4). A canon of

two hundred 19th-century British novels, for example, would be considered very large but, as Moretti points out, it is still less the one per cent of the novels that were actually published during that century (Graphs 4). Close reading, writes Moretti, could never hope to make sense out of the mass of novels published in the nineteenth-century, and can only offer a view literary history that has been stitched together out of “separate bits of knowledge about individual cases” (Graphs 4). Moretti argues throughout his work that a literary field is not a sum of individual cases but is a collective system that must be grasped as a whole (Graphs 4). Jockers, who previously worked alongside Moretti as part of the Stanford Literary Lab, believes that the analysis of the broad system of literature can now be carried out following the rise of big data. Jockers argues that the big data revolution, which has had such a big influence on the sciences, has become a major catalyst for a digital revolution in the humanities (ch. 1). For Jockers, there are some questions that the old methods of literary studies simply cannot answer. But now, with access to vast swathes of literary data and the continued development of large corpora of literary texts, Jockers believes we can begin to ask questions that were once inconceivable. And, in order to answer these questions, Jockers argues that a new methodology and “a new way of thinking about our object of study” is necessary (ch. 1).

Moretti’s most famous and provocative book, Graphs, Maps, Trees, employs new methods to investigate global trends in literature and, similarly to digital scholarly editing having a theoretical basis in the editorial shift of the 1980s, Moretti’s experimentation in this book has a theoretical basis in his own past work. Moretti adopts these methods because they offer a potential solution to a problem in the way he approaches the study of World Literature. Moretti, for example, is interested in how

market forces effect the evolution of literary form and he argues that in order to examine World Literature in this way we must approach it not through close reading but through distant reading. Jockers points to Ian Watt’s book The Rise of the Novel as one of the better examples of close reading as a method of arriving at a conclusion about literary history but even Watt’s study, according to Jockers, is flawed (ch. 2). Watt argues that the elements that led to the rise of the novel could be detected in the works of a small selection of authors but, though Jockers believes his conclusions to be reasonable and appealing, there is no way of knowing if his elaboration from the specifics of these few novels to the state of the novel in general is correct. As Jockers asks, “what are we to do with the other three to five thousand works of fiction published in the eighteenth century?” (ch. 2). He claims that close reading is useful in many situations but in terms of studying the broader picture of literary history it is insufficient. Jockers goes further and says that the method of close reading is now both impractical as a means of evidence gathering in the digital library and inappropriate for the study of literary history now that big data analysis is possible (ch. 2). The nature of the evidence has changed, writes Jockers, and with it so must the ways in which we gather that evidence. That is, we must not focus on a select few novels and treat them as representative of all novels but instead our analysis must include a consideration of all novels. And what digital methods can do, through topic modelling for instance, is take vast amounts of data and discern underlying trends and patterns.

Digital methods are not entirely necessary carry out distant reading. Moretti and Jockers simply see digital methods as the most appropriate and efficient method to a quantitative analysis of literature. In “The Slaughterhouse of Literature”, for example, Moretti aims to discover why some texts are canonised and more are forgotten. His

corpus is a selection of 19th-century detective stories: both popular ones, the works by Arthur Conan Doyle, and forgotten ones, Doyle’s “rivals” (Distant Reading 71). Moretti’s hypothesis is that the existence of a particular formal device in Doyle’s work (this device is the use of clues) guarantees the success of the Sherlock Holmes novels, whereas the lack or misuse of this device in other works guarantees their failure. His method of discovering the status of this device in the texts in his corpus is to simply read the texts. He does this not by himself but with help from a group of graduate students. There were no digital methods employed but simply a minor instance of crowdsourcing. Of course, without access to Moretti’s data his results cannot be analysed. Moretti does not even elaborate on his method. Were the instances of each formal device, each clue, marked-up in XML? Evidently not, but doing so would have made his experiment repeatable by others. This ability to make your data available for repeat experimentation by other scholars is one of the advantages of using digital methods to carry out distant reading, as opposed to the method employed by Moretti here. Digital critical editors are ideally placed to offer scholars stable and reliable texts for inclusion in digital corpora. And the advantage of having reliable textual data in a stable, readily available, and reusable form is that the experiments carried out using quantitative methods are repeatable and the data itself is open to critique. The work of digital critical editors should be seen as vital for the quantitative analysis of literature because it is only when significant amounts of texts are digitised and readily available that the results of quantitative analysis can have any claims to being a true representation of the broader literary field. Moretti, for example, looks forward to a future where every novel ever published has been digitised and made available for digital text analysis. In his essay, “The Novel: History and Theory”, Moretti calls for a

theory of the novel based on the historical development of prose style. He says such a study would could only be imagined in a future where every novel ever published exists in a digital database. This database would then allow us to look for patterns across billions of sentences (Distant Reading 164).

In his quantitive analysis of 7,000 titles of novels published between 1740 and 1850, Moretti is forced to acknowledge a limit of his investigation. Moretti wants to show how market forces dictated formal qualities of novels. In particular, Moretti is interested in how the market influenced the average length of novel titles. But in his analysis, and he acknowledges this, he tends to focus on extremes. His graphs show us the fall of extremely long titles and the rise of very short ones. But, for the most part, titles in this period fell mostly in the middle of these two extremes. In talking about the decrease in the average length of novels, he points out that this decrease is small and slow. But it is a notable decrease all the same. Why the focus on extremes then? Moretti’s reasoning was that literary historians “don’t really know how to think about what is frequent and small and slow; that’s what makes it so hard to study the literary field as a whole: we must learn to find meaning in small changes and slow processes - and it’s difficult” (Distant Reading 192).

The solution to this is to “take those units of language that are so frequent we hardly notice them and show how powerfully they contribute to the construction of meaning” (Distant Reading 206-207). An example he gives is the use of the definite and indefinite articles in the titles of two particular genres of novel: the anti-jacobin novel and the new woman novel. These are two genres which Moretti believes have a lot in common and are suited to such a comparison. They are both ideological genres with a reliance on contemporary politics, even though they are separated by a century. In the

case of the anti-jacobin novel, Moretti found that the use of definite and indefinite articles in titles aligned with their use in his entire corpus: 36 percent began with the definite article and 3 percent with the indefinite. In the case of the new woman genre, however, he found that the definite article is used in 24 per cent of cases but the indefinite article is used in 30 percent of cases. Why do new woman novels deviate from the standard as defined by Moretti’s corpus of 7,000 titles? Moretti cites an essay by Harald Weinrich who argues that texts are always pointing readers in either of two directions: forwards or backwards. Anti-jacobin novel titles are pointing readers backwards, to something they already know, hence The Banished Man, The Parisian,

The Democrat. These novels “don’t want to change received ideas, they want to use

them” (Moretti, Distant Reading 206). New woman novel titles are pointing the reader forward, to something they have not encountered before, hence A Girton Girl, A Hard

Woman, A Daughter of Today. Those novel titles are telling us that in the following

pages we will be encountering these figures as if for the first time (Distant Reading 206).