5.1. Value and Relevance
5.2.1. Representations and Context
It is important to supply the data that results from the use of computational methods with the right context, both in the sense that the resulting data needs to be explained, and in the sense that scholars need to keep connecting it to its source text during the interpretation process.
5.2.1.1. The Need for Explanations
As was explained, it is not the data, or the representations of a text that we want to interpret in the end. Rather, we aim to interpret the source text. As such, representations do not stand alone and are not the main focal point of an interpretative study. They can be a great help both when answering questions and when coming to new questions, but representations and visualisations are results of raw data that can be interpreted in a number of ways. This means that they always require an explanation when used in an interpretation process. Visualisations presented without any context, remain fairly meaningless, as they do not explain in themselves what we are looking at, why they are relevant, or what hypothesis or questions someone meant them to answer. Also, they do not
describe or explain the questions they may evoke, nor the methods they were generated with. Peter Verhaar writes the following about this:
[Visualisations] remain fickle and indeterminate until their particular signification is clarified via the textual modality. […] The graphical modality [depends] on the textual modality to produce a meaning beyond the pattern itself. In the case of data visualisations, the text’s ability to provide anchorage is vital, as without it, graphical displays remain problematically devoid of meaning.118
82 Naturally, raw data can be great evidence, but, as was mentioned, it is not an interpretation in itself. Data on its own is not inextricably linked to one particular interpretation and can still be used to support an array of different interpretations. This also becomes clear from Stephen Ramsay’s data of
The Waves, which he describes as supporting various interpretations. Therefore, it is of importance
to sufficiently explain the way in which the data is incorporated in an argumentation, how it was used and why. If this is not the case, as became apparent from a few of the examples in chapter 2, an argumentation can seem unfounded and may end up being difficult or even impossible to follow, which is problematic.
It is also absolutely necessary to explain visualisations. After all, the scales, colours and types of graphs we are seeing as products of computational techniques are the result of choices made by people, and they are not by far as objective as they appear to be. The formulas they are based on are not devised to make sure that results are as close to reality as possible, but for visualisations to be easier for scholars to compare.119
5.2.1.2. Dependence on the Source Text
In addition to the importance of explaining data, it is also important to keep connecting it to the text it was based on during the interpretation process. As was mentioned, the data functions as a new layer, not as a separate one, and therefore it is necessary to return to the original text to make sense of what we are seeing in our graphs. By applying the knowledge gained from graphs and other visualisations to the original text, this is exactly where new connections are made, as mentioned in section 5.1. This is especially important when using computational methods on a micro-level. From my case study it has become apparent than when doing so, the acquired data often prompted me to look at the concordances of a word. After all, dispersion plots alone do not tell us why a word appears when and where it does; they just tell us that it appears, and that it may be of importance. Thus, the use of computational methods in that case is a good method to find important themes, words or patterns, but subsequently subjecting the text to a close(r) reading of a word’s
concordances remains necessary to build up a good argumentation. After all, just from knowing that Siddhartha uses the word ‘Leben’ quite often, we cannot draw any conclusions without speculating. It is only when zooming in on the concordances of the word that it turns out that Siddhartha’s use of the word actually reflects his progress. Thus the use of concordances proves to be a very useful zoomed in type of close reading. Adding to that, it shows just how important it is to combine distant reading with close reading when it comes to doing a micro-study. When applying such methods to a micro-study it is clearly of importance to have prior knowledge of a text.
83 5.2.2. Inaccuracies
As proven by the case study in this thesis, inaccuracies can easily creep into the data, which can be quite problematic depending on the size of a text. As was explained in section 4.3.1.1., the themes of
Lehre, love, time and searching were defined with the help of a topic model as well as all variations of
words akin to ‘Lehre’, ‘Liebe’, ‘Zeit’ and ‘suchen’ that were used in Siddhartha. Subsequently, all of these words were incorporated in the dispersion plots of each theme, to give a more accurate idea of its dispersion. However, in section 4.3.3.3., when I set out to make a graph depicting Siddhartha and Govinda’s use of all four themes per dialogue, I stumbled across a handful more words connected to the themes. As the dialogues are very short and do not contain a lot of words, even those few newly found words made an important difference in the resulting graph (figure 24). This is why those words were incorporated in these graphs after all, despite them not having been used in the graphs
covering the bigger chunks of text, earlier on in the study. The case study thus shows how just using a set of computational methods did not account for all words in Siddhartha that are connected to the four themes. To really find each and every single word, four lists of words would have to be made manually. This is why it is important to always keep in mind that graphs are indeed not as objective as they seem, as well as to always combine a close reading with a distant reading, even if just to check if the data is accurate. Lastly, of course, this shows the importance of transparency when it comes to the data that is used for an argumentation. As Peter Verhaar writes: ‘Counts are rarely neutral, as they invariably demand decisions on what and how to count. Discussions about counts are not trivial, as decisions on how to count can directly have a strong impact on the outcomes of subsequent statistical processing.’120