Chapter 4. Data and Methods
4.5. Methods
4.5.3. Identifying Appropriate Methods of Analysis
4.5.3.2. Keyword analysis of the churches corpora
A potential concern with the analysis of the collocates was that it reduced the amount of data from the corpus that I was able to use in order to identify representations around homosexuality. For example, in Table 4.10, for the Catholic Church Corpus, I only identified 45 concordances of homosexual*, that is 16% of the lines, which contributed towards three main representations. I was concerned that when working with collocates in a relatively small corpus (under 100,000 words), only a small number are produced, unless the cut-off points are made very low. Additionally, quite a lot of the collocates I had found were grammatical ones (for example, Table 4.6 contains collocates like de (108), la (90), a (39), al (8), que (47), por (11), las (49), el
homosexuality. I therefore decided to engage in methodological triangulation (see Baker and Egbert 2016) by using a different form of corpus analysis – keywords. My plan at this stage was to determine whether keyword analysis would contribute anything new to my findings.
Keywords, in the sense that I refer to them here, are words whose relative frequency of occurrence is statistically higher in one corpus when compared to another, which is referred to as a reference corpus. Using log-likelihood tests, which indicate how confident we can be that a word is key due to chance alone (Baker 2006: 125), and no minimum frequency threshold, I obtained the keywords of the Catholic Church Corpus in AntConc 3.4.3m (Anthony 2014), using the Evangelical Churches Corpus as reference, and vice versa. From the list of keywords obtained from each data set, I decided to select the top 150 so as to be able to explore the corpora in more detail. Then, I grouped these keywords into categories according to the meaning they conveyed, as I had also done previously with the collocates (for the lists of keywords and their categories see Appendices II to V). Once again, this resulted in the
identification of shared categories, such as geographical names, the names of Christian churches, their leaders, biblical references, references to (the absence of) sex, age, time, written sources of information, and people. As in the previous stage of analysis, there were also different categories within the two data sets. For example, the Catholic Church keywords referred to topics such as trouble, helping, respect, the family, marriage, church activities, education, wealth, and positive attributes, whilst the keywords in the Evangelical Churches Corpus indexed topics such as sexuality, gender, politics, laws, support and opposition, and the LGBTQ+ movement. At first sight, it seemed to me that these keywords could help to complement the findings in the previous analytical stage. However, when I skimmed the concordances of these
lexical items, I could see that most of the findings within them had already been identified in the analysis of the concordances of the representations that resulted from my first attempt at collocational analysis. For instance, the analysis of the
concordances of the representations of the Evangelical churches, as shown in Table 4.10, showed that politics, elections and the leaders of the gay movement in relation to future laws were often mentioned in this data set, something that the keywords also illustrated.
At this stage, it was clear that the reason why I was not finding anything new with these keywords had to do with the reference corpora used. By comparing one corpus to the other, I was in other words comparing one church to the other, identifying what was different between them as related to each other, rather than finding out the
peculiarities of each church on its own in regard to their approach to LGBTQ+ matters. It seemed to me that by focusing on the particular topics raised by each church in relation to sexual identity, as compared to other topics that they usually discuss on their websites would allow me to obtain more insights about the topic researched. The most appropriate way to deal with this situation was to use a different reference corpus. Therefore, I needed to create one that was representative of the topics discussed by these churches on their websites, excluding my topic of interest, that is sexual diversity. However, the website of the Evangelical churches where most of the data in its corpus was obtained was no longer available, which made the
creation of a suitable reference corpus impossible to carry out. As I was invested in identifying the specific ways in which these churches and websites discussed
LGBTQ+ topics as compared to other topics in which these same organisations were interested, I did not consider appropriate to use other general reference corpora, as they would not highlight the particularities of the churches in relation to themselves.
Because of this, I made the decision to include another data set in the study and rule out the keyword analysis of the corpora altogether. Therefore, I would create a
collocational profile of homosexual* from three different data sets, and see how these compared, contrasted, and interrelated. After some reflection, the corpora that seemed most suitable to include was one that was initially part of the study, namely the corpus of the parliamentary debates about the civil-partnership and anti-discrimination laws. If this data set produced an interesting contrast with the church corpora, my plan was to have three analytical chapters where I would do collocation analysis on three different data sets. In order to make a final decision about this, I proceeded to do some test analysis on the Parliamentary Debates Corpus.