3.2.4 „Feminist/s/ism‟ with detailed modification
Chapter 4: Collection and annotation of the data
4.1 Collection of the data
The discussion below provides a description of the data source that I selected (section 4.1.1) and an explanation of how I selected the data that makes up the „feminist/s/ism‟ data (section 4.1.2).
4.1.1 Selection of the data source
The data source for the study is Nexis UK, a searchable electronic repository available through LexisNexis. Nexis UK (Nexis hereafter) allows the researcher to search for particular search terms in various types of corpora, including newspaper corpora3. This has made it a popular resource for corpus linguistic studies of newspaper data (for example Baker & Levon, 2015; Branum & Charteris- Black, 2015; Jaworska & Krishnamurthy, 2012).
The Nexis corpus „UK Publications‟ includes data for eight of the ten top-selling UK national newspapers in the National Readership Survey (NRS) figures for the final month of the period under investigation – December 20094
:
3
See http://www.lexisnexis.com/uk/nexis for more information on Nexis UK. 4
77
The Daily Mail (The Mail hereafter).
The Express. The Guardian. The Independent. The Mirror. The Sun. The Telegraph. The Times.
The two titles from the NRS list that are not available are The Daily Star and The Daily Record. The former is a red-top tabloid similar to The Sun, and the latter focuses specifically on Scottish issues. With the exception of these omissions, the newspapers available through Nexis cover the main titles available to readers across the UK through 2000-2009.
Data covering the entirety of 2000-2009 is available for seven of the eight newspapers. The exception is The Telegraph, for which data covering the period from 1st January 2000 to 30th October 2000 is not available. However, the data available for the remainder of The Telegraph in 2000 does include articles on feminism, and so it is still possible to gain a picture of The Telegraph‟s coverage of feminism in 2000. Of the eight newspapers, Nexis includes Sunday editions for all except The Sun, which during the period under investigation was represented on a Sunday by The News of the World. In line with previous research on press portrayals of particular groups (for example Kim, 2014), I decided to include all editions and sections in the data. Therefore, articles relating to feminism from sports, finance sections and fashion supplements are included in the data, providing as broad as possible a picture of the impression of feminism that each newspaper would have given its readers.
4.1.2 Selection of the data
I used Nexis‟s wildcard option – „feminis!‟ - to find data covering the newspapers‟ use of „feminist/s/ism‟ between 2000 and 2009. This brought up results that included not only the root forms „feminism‟ and „feminist‟, but also words formed by the addition of morphemes in word-final position, for example „feminisms‟ and „feministing‟. However, the wildcard search does not pick up on variations formed by adding morphemes to the beginning of the word. This meant that variations such as „anti- feminist‟ were recognised by Nexis (which treats the hyphen as a space), while unhyphenated variations („antifeminist‟) were not. It is possible, therefore, that articles that contain uses of unhyphenated derivational forms were not identified by Nexis, and are therefore not included in the data. However, where unhyphenated forms do occur in the articles in the data, I include them so as to give as full as possible an overview of „feminist/s/ism‟ in the texts.
78
The data comprises three articles per newspaper per year, in order to ensure that all newspapers and the entirety of the period under investigation are represented. I used Nexis to order search results according to relevance, ranking the results “according to greatest frequency and relevancy of terms” (Lexis Nexis, 2012, p. 16). This meant that those articles in which the search term occurred most frequently in relation to total number of words appeared at the top of the results. I also used Nexis‟s „duplicate options‟ setting to ensure that reproductions of the same article would not appear in the results. Performing each search – for example „feminis!‟ in The Mail from 1st January 2003 to 31st December 2003 – in turn, I found that selecting the three most relevant articles from each newspaper and year (240 articles in total) provided me with 2,539 occurrences of the search term. This fell within my expectations of what would be manageable. It also allowed for future alterations: if, during the annotation and analysis stages, I found that the amount of data was too great, I could revisit the Nexis relevance results and remove the third most relevant article for each year of each newspaper from the data, resulting in a smaller and more manageable amount of data.
I downloaded the results of each search into individual Word files – for example Mail 03, which contained the three most relevant Mail articles from 2003 (including „Is Feminism Dead?‟, analysed in section 3.2) - and provided a code for each article within each file, e.g. Mail 03a („Is Feminism Dead?‟), Mail 03b („Sorry, But Warthogs Have You Beat, Guys‟) and Mail 03c („Girlyvision‟). This resulted in a total of 80 files, one for each year of each newspaper. I then used Word‟s search function to search for the segment „feminis‟, allowing me to find all occurrences of „feminism‟, „feminist(s)‟ and derivational forms such as „post-feminist‟ and „anti-feminism‟. I placed each sentence in which „feminist/s/ism‟ occurred into an Excel file, and labelled each sentence according to the newspaper and article it appeared in, and the date of publication. This ensured that I could easily find any contextual information that I might need during the annotation and analysis stages. I also labelled occurrences according to their placement in an article (for example headline, photo caption, main body), whether they were part of the title of a book, magazine, etc. (e.g. in the book title The New
Feminism), and if they were from a passage of reported speech (e.g. “National Public Radio asks if
she‟s the „new face of feminism‟” (Guardian 08a)).
At this stage it was possible to gain an overview of how much of the data was drawn from each newspaper and each year. Figure 4.1 shows how the 2,539 occurrences of „feminist/s/ism‟ are distributed across the eight newspapers:
79
Figure 4.1: Total occurrences of „feminist/s/ism‟ per newspaper
Although the focus of the study is not on differences between the various newspapers‟ coverage of feminism, figure 4.1 provides an early result. It shows that a greater proportion of occurrences are from the broadsheets (The Guardian, The Independent, The Telegraph and The Times) than the tabloids. This accords with Jaworska and Krishnamurthy‟s (2012, p. 409) finding that „feminism‟ in the Bank of English reference corpus occurs more in broadsheets than tabloids. The predominance of
The Guardian also reflects its status as “probably the only mainstream publication in the UK that
frequently tackles questions to do with the current state of feminism” (Dean, 2010, p. 396). This could be seen to make the study unbalanced, with the balance in favour of the broadsheets. However, I argue that the fact that different newspapers account for varying proportions of the data reflects the relative levels of attention that each newspaper gives to feminism: a reader‟s understanding of the meaning of „feminist/s/ism‟ would be more influenced by the broadsheets than the tabloids were they to read all the newspapers in the data.
The data also provides a snapshot of coverage across the decade. Figure 4.2 shows how many occurrences of „feminist/s/ism‟ were taken from each year:
0 100 200 300 400 500 600 700 800
80
Figure 4.2: Total occurrences of „feminist/s/ism‟ per year
The number of occurrences fluctuates between 166 in 2005 and 342 in 2009. Because of the way in which I gathered data (selecting articles with the most occurrences of „feminist/s/ism‟ in comparison to total words), figure 4.2 does not show trends in coverage of feminism and feminists across the decade. It does, however, demonstrate that there is a spread of data representing the whole of the decade.