Chapter 1 Introduction 1
1.6 Natural language processing, programmes and data analysis tools 7
1.6.3 Data analysis and visualisation tools in the qualitative analysis 12
Since the nature of the data analysis in this thesis is both quantitative and qualitative, it requires two types of data analysis and visualisation. On the quantitative level, statistical visual tools, including frequency distribution graphs and collocation tables, are implemented in the course of analysing the produced datasets. On the other hand, the qualitative level, as mentioned above, employs the use of concordance lines34 in examining
the collocations and LG patterns of nature in the Qur’an and LancsBox to envisage the representation of collocations of nature as a network of meanings via collocational graphs and networks, which are defined as:
Summaries of complex meanings of words in texts and corpora. These networks can provide useful information about key topics [the theme of nature in the Qur’an] in texts and discourses as well as their connection (Brezina, 2018, p.79).
1- Concordancing: A Core Function in Sketch Engine
Also referred to as Key Word In Context (KWIC),35 a concordance is a list of all of the
occurrences of a particular search term in a corpus, presented within the context in which they occur- usually a few words to the left and right of the search term (Baker et al., 2006, p.42). The rationale behind the use of this core function in Sketch Engine is that, as with
34 Concordancing is a [qualitative] analysis technique that allows linguists to investigate the occurrences and behaviour of different word forms in real-life contexts. This is quite different from more “traditional approaches” in linguistics that simply depend on the intuition of native speakers in order to determine the “correct” usage. See alsoWeisser (2016, pp. 67-79).
35Key Word In Context concordance is the preferred format for displaying concordance data because it is easy to observe the context of to the right and left. Available from: [https://www.sketchengine.eu/my_keywords/kwic/], [Accessed 19 October 2016 onwards].
collocations, concordances provide information about the “company that a word keeps” or its discourse prosody36 (See Figure 3 for a sample of concordance lines in Sketch Engine).
This research, therefore, employs this function in conjunction with the use of alignment to conduct the qualitative analysis which attempts to uncover the SP of nature in the Qur’an and its translations; that is, by examining the surroundings of the words describing natural phenomena.
Figure 3: Concordance lines for catch fire (figure taken from Kilgarriff et al., 2014)
Following pioneering corpus linguists (e.g., Renouf and Sinclair, 1991; Stubbs, 1995; 2001; Louw, 2000; Sinclair, 2003), concordance lines are used to navigate the context of the collocations (i.e., bigrams) found in the statistical analysis of the collocation extractions. A seven-step procedure is followed for the analysis of a selection of concordance lines for a specific node (Sinclair, 2003, xvi-xvii) and will be described later in this thesis (Chapter Four).
2- LancsBox v. 4.x
As previously indicated, on the qualitative level, this research employs LancsBox v. 4.x,37
which is the fourth version of a new-generation software package for the analysis of language data and corpora developed by Lancaster University. It is a recently developed and appraised data visualisation tool, which the researcher found useful for the data visualisation of collocational networks of natural phenomena 38 and the aligned
concordance lines of nature in the Qur’an and its translations.
36 Discourse prosody is a term reported by Stubbs (2001) relating to the way that words in a corpus can collocate with a related set of words or phrases, often revealing (hidden) attitudes(as cited in Hunston 2007, p.251).
37See Section 4.1.5 for the rationale of opting for Lancsbox instead WordSketch in Sketch Engine. 38This is done via employing the Loglikelihood statistic, which is built in the software. [Chapter Four]
1.7 Introducing the qualitative analyses of textual data
This section gives a preview of the qualitative analysis that is inherent in the mixed approach used in this research incorporating quantitative and qualitative analyses. The flow of tasks in the textual analyses to explore collocations and identify the SPs of nature in the Qur’an and its translations following this approach is shown in the following figure.
Figure 4: The data analyses of SP of nature in the Qur’an and its translations
Furthermore, this approach employs Stubbs’s (2001) classification of textual data, where each task of the data analysis is applied to a specific type of textual data, as seen in the following table:
Table 2: Types of textual data and data analyses in this research (classification based on Stubbs, 2001, pp. 66-7, my hyphenations)39
Type of Data Description Textual Data Analysis
first-order data raw corpus data content analysis [qualitative] second-order data corpus data as manipulated by a basic
concordance program
collocation-via-concordance [qualitative]
third-order data corpus data that has been manipulated using statistical analyses to present patterns within the data
collocation-via-significance [quantitative]
As shown in Table 2, the qualitative analyses in this research are applied on two types of data: the first-order data shown in the raw corpus (the Arabic Qur’an)40 and the second-
order data in the corpus as processed by a concordance program (Sketch Engine).
39See also McEnery and Hardie (2012, p.127). Available from: [https://epdf.pub/corpus-linguistics-method-theory-and- practice.html], [Accessed 04 May 2019].
This section of the introductory chapter will focus on the qualitative part solely because, while the adopted tasks of the quantitative analysis of this research have been drawn from a previously tested model of statistical analysis to investigate collocations (e.g., Evert, 2005; 2008; Bartsch and Evert, 2014), the qualitative analysis of words related to nature in the Qur’an is used differently in two instances in this thesis. Firstly, it is used in the content analysis (CA) in the preliminary stage (pre-methodology) of the research to categorise the contextual meanings of nature in the Qur’an. Secondly, it is used in the methodology of this research to tag the evaluative and discourse prosodies of nature in the Qur’an and its translations, an approach that has not been explicitly used before in this manner by researchers on SP in general and SP in the Qur’an in particular (e.g., Louw, 1993; 2000; Partington, 1998; 2004b; Sinclair, 2004a; Younis, 2018). The following lines will present a brief description of the two instances of the qualitative analyses in this research; details of the first will be found in Chapter Two and the second in Chapter Four.
In its first instance, the qualitative analysis or CA is used to elicit the contextual meanings of words describing natural phenomena in the Qur’an based on the previous literature; it is not within the framework of the methodology chapter only because it occurs at the preliminary stage. CA is a technique for making replicable and valid inferences from data and their context “to provide knowledge, new insights, a representation of facts, and a practical guide to action” (Krippendorff, 2018, p. 403). The purpose, as Krippendorff (2018) argues, is to obtain a condensed and broad description of the phenomenon (in this case, nature in the Qur’an), and the outcomes of the analysis are concepts or categories describing it. Usually, these concepts or classes are used to build up a model, a conceptual system, or a conceptual map of categories from data which are “texts to which meanings are conventionally attributed: verbal discourse, written documents, and visual representation” (Krippendorff, 2018, p. 403). Furthermore, Hsieh and Shannon (2005) state that the current applications of CA rely on one of three distinct approaches to interpreting meaning from the content of text data: conventional, directed, or summative. They are described in the following table:
Table 3: Three approaches to content analysis (table taken from Hsieh and Shannon 2005, p.1286)
Type of CA Study Starts With Timing of Defining Codes or Keywords
Source of Codes or Keywords Conventional Observation Codes are defined during data
analysis
Type of CA Study Starts With Timing of Defining Codes or Keywords
Source of Codes or Keywords Directed Theory Codes are defined before and during
data analysis
Codes are derived from theory or relevant research findings Summative Keywords Keywords are identified before and
during data analysis
Keywords are derived from the interest of researchers or review of literature
In its preliminary stage (as seen in the grey box in Figure 4), this research implements the
summative approach and builds a relatively exhaustive list of terms for natural phenomena.
However, instead of calling them keywords,41 the list of natural phenomena is referred to as a list of terms, since they do not represent all the themes of the Qur’an. They only represent nature as a Qur’anic theme and the underlying meanings related to it. These terms and their contextual meanings, derived from the review of literature on Qur’anic studies about nature and the researcher's close reading of the Qur’an, were identified before and during the data analysis. Then, CA is conducted to provide an understanding of these terms inductively;42 data is organised by coding to create categories and abstractions (Hsieh and Shannon, 2005, p. 1281; Bernard and Ryan, 1998). Coding means that notes and headings are written while examining the literature on nature in the Qur’an; abstraction means generating categories which depict the different contexts in which nature occurs in the Qur’an (Bernard and Ryan, 1998, p.608 and p. 619).
On the other hand, in its second instance in this research and, as seen in Figure 4, the task of qualitative data analysis falls within the approach of analysing collocation-via-
concordance,43 which is a non-statistical technique where a linguist uses his/her intuitive
scanning in the inspection of concordances. This hand-and-eye technique44 is usually implemented in other neo-Firthian45 research and has been scrutinised and praised by linguists, who used it such as Stubbs (1995, pp. 27-8) and Younis (2018, p.126). Louw (2007b), who is also in favour of this approach, believes that to uncover the hidden meanings of any text it is a requirement that the “corpus stylisticians” use a concordance.
41 Keywords in corpus linguistics are defined as words in a corpus whose frequency is unusually high (positive keywords) or low (negative keywords) in comparison with a reference corpus. Available from: [https://www.kent.edu/appling/corpus-linguistics- glossary], [Accessed 04 May 2019].
42Inductive content analysis is a qualitative method of content analysis that researchers use to develop a theory and identify themes from raw textual data. It relies on inductive reasoning, in which themes emerge from the raw data through repeated examination and comparison, and reduces the material to a set of themes or categories. See also Thomas (2006, pp. 237-46).
43In contrast to the collocation-via-significance; both are commonly used techniques of analysing collocation. See also McEnery and Hardie (2012, pp. 122-66).
44See also McEnery and Hardie (2012, p.125, my italics in the above).
He states that “the literary world of any text is assembled afresh every time that text is read”, and to demonstrate this process:
Corpus stylisticians require only three things: (i) the literary text in a machine-readable form which allows us to read the text by random access as well as linearly in its traditional paper or hard copy form; (ii) a reference corpus of natural language of both spoken and written and containing fiction and non-fiction; (iii) concordance software containing collocator and a facility for the co-selection expressions and which produces raw data as its output (p.104).
However, since this research attempts to employ a mixed approach in exploring the collocations of natural phenomena in the Qur’an, and in lieu of extracting collocations manually from concordance lines as in Stubbs (1995) and Younis (2018), the statistically verified patterns (i.e., collocations) previously found in the quantitative analysis as seen in Figure 4, are utilised to provide explicit criteria (e.g., the statistically verified collocate sets for each node) for coding the data with the appropriate SP for each of the nature terms. This approach also agrees with Stubbs’s (2001) later viewpoint on the importance of the primacy of the human analyst over the statistical results of collocation, and that he/she should constantly be checking the outcomes of the mathematical calculation of collocation against concordance and raw text (Stubbs, 2001, p.71). Hence, the purpose of this task is to employ the statistical results of collocation in identifying the SP of nature in the Qur’an and its translations. The revealed SP meanings [i.e., evaluative and discourse prosodies] of nature terms in this task are based on the [statistically verified] collocations of the words referring to natural phenomena (i.e., nodes) with other words that unveil these meanings (i.e., collocates). Finally, a detailed discussion of this second instance of qualitative analysis will be provided in Section 4.1.5 of this thesis.