Chapter : Data and Methods - Semiotic resources and argumentative strategies in tweets about po

In this chapter, I present the data and methodology of the study. In the first section, I describe the data set, following the design of the study, including a description of the two stages of analysis. This comprises the coding process for the entire data set and a detailed analysis of a sample of the data set. I also present the framework of analysis of the research, which draws on content analysis, the discourse-historical approach and semiotic resources in relation to each stage of the analysis. This research project analyses tweets related to political TV shows in Chile from a qualitative perspective, but with large-scale data collection. This chapter provides an overview of the data and the methods for collection of social media texts, presenting examples from the data to illustrate the categories used in the analyses, which are then presented in subsequent chapters.

4.1 Data selection and collection

For this research, I analysed tweets related to five Chilean political TV shows broadcast during the 2016 season. After collecting tweets from a full season of each TV show, from March to December, at the beginning of the process, I reduced the amount of data from 9 months of tweets to 3 months, due to the sheer volume of data. The original data set comprised 19,179 pages, with an average of seven tweets per page, giving a total of approximately 134,000 tweets to code. At the beginning of the coding process, I estimated it would take more than 8 months of full-time coding (for a single researcher) to finish the whole data set. The data were collected from the Twitter webpage and extracted as PDF files, and for this reason, there was not exact numbers of tweets per show at the time of collection.

Table 1 Number of pages of data collected between March and December 2016

Tv Show March April May June July August September October November December

Given that the coding process was just one stage of the analysis, I decided to reduce the amount of data for the first stage of the analysis from ten to three months to make the data collection more manageable. The three months analysed were March, July and December, 2016. The dates selected match the beginning, middle and end of the TV season in Chile, which starts in March and finishes in December, usually recessing for the summer in January and February.

Table 2 Number of pages of data collected from the months selected for the analysis

Tv Show March July December Total per TV Show

Entrevista

Verdadera 92 190 108 390

Estado Nacional 728 799 549 2,076

Ciudadanos¹ 214 228 132 574

El Informante² 0 724 573 1,297

MHCC 383 233 382 998

Monthly total 1,417 2,174 1,744 5,335

The final data set, therefore, comprises 39,684 tweets (1,114,808 words, which includes emojis, hyperlinks and every isolated character or group of characters) that were collected manually from the Twitter webpage and selected using official hashtags promoted by the TV shows and using the advanced search function of the Twitter platform to narrow down the collection dates. This kind of tweet is publicly available to anyone who has an account on the Twitter platform and is easily traceable. The tweets were recovered from the end of 2016 to the beginning of 2017.

Although collection using a Twitter search can be considered limited compared with automatized software extraction, this method offers other possibilities to the researcher, such as recovering data from past time frames or including different modes. As Latzko-Toth, Bonneau & Millette (2017) argue, an advantage of manual data collection is that it allows researchers to familiarise themselves with data in their “native” format. This allows the exploration of different kinds of content and phenomena, such as images, colours and

1The tweets considered in March of the show Ciudadanos are from April, because the season of started at that time.

2 There were not tweets collected from El Infomante in March because the season of the show started in June.

avatars, among others, as shown in the following caption:

Figure 3 Example of data collected in a PDF file

The example shows a page from the data set and it illustrates the different elements that can be captured in this format. As can be appreciated from the example, PDF files collected in this way look very similar to the originals displayed on the Twitter website. Most of the software packages designed to extract social media data export tweets in plain text.

Although this is very useful for the analysis of large volumes of verbal elements of tweets, for this research, I chose manual data collection. As stated above, one of the main aims of this research is to explore different semiotic resources. For this reason, manual collection in a “native” format allows approaching the data including most of the elements that are present for the user, including a wide variety of different semiotic resources and modes. A notable exception for this format is video, which is of course not captured in its original format in a PDF file. Also, some images are displayed in a preview format, with the full image being available to click on and expand in the original context, which can be accessed= from the same PDF file that also captures the hyperlinks present in the original

website as metadata. However, for any of these and other multimodal features, it is possible to find the original tweet (if it has not been deleted) and to examine it more closely on the native platform.

For the first stage of analysis, I coded the formal features and topics of three months of tweets, a total of 39,684 tweets. This process is explained in more detail in section 4.5, below. After the codification process, for the second stage, I analysed in detail a sample of 140 tweets randomly selected from the larger data set, selecting 10 from each TV show per month. The tweets were extracted into an Excel spreadsheet and selected using the random function, to ensure an unbiased selection. In this process, I extracted tweets from ATLAS.ti to the spreadsheet in different documents classified by month and TV show. After this, each tweet in the spreadsheet was assigned a number to identify it. Following this, those numbers were re-ordered with a random distribution in the spreadsheet to select ten from each document randomly. This process helped to downsize the data set to explore a smaller selection of tweets in more detail in the second stage of the research.

4.2 “First screen”: description and characterization of television shows

Given that the social media data gathered for this thesis are related to political television shows in Chile, it is important to give some information about the television programmes in question. As mentioned above, the Twitter data generated by viewers are closely related to the discussion carried out on the television shows. In this sense, the shows give context to the online discussion. An awareness of the structure and main features of the TV shows, therefore, gives a better picture of the data (Roberts, 2008). As Boyd and Crawford (2012) state, social practices in social media cannot be understood in isolation. It is also necessary to understand the diverse perspectives and varied social actors that participate in creating meanings around these social practices. For this reason, to provide a context for the political discourses generated by users, I outline the main features and topics covered by the five TV shows whose hashtags are included in this study.

Table 3 Television shows related to the social media data in this study

Television show Hashtag Format Guests Broadcaster Emission Estado Nacional #enacional Panel

show

Resident panel + guests

TVN Weekly

Ciudadanos #ciudadanoscnn Panel

As Table 3 shows, the different TV shows have different formats and were broadcast by different channels. Despite the structural differences between the shows, all are about politics and current affairs, and their guests are usually politicians or experts who are expected to analyse the political landscape and specific events. They deal not only with domestic politics but also international affairs from a Chilean perspective, such as the Venezuelan crisis or the impeachment of Brazilian President Dilma Rousseff. Another common element of the selected TV shows is that they all encourage viewer participation, via social media platforms, by promoting hashtags and including tweets on screen, as shown in the examples in Figure 4, below:

Figure 4 Promotion of hashtags on the shows

Figure 4 shows two screen captions of the TV shows Entrevista Verdadera and El Informante, which promote their official hashtags at the bottom of the screen. In the case of Entrevista Verdadera, the hashtag #opinoEV is promoted, though the Twitter account of the show,

@entrevistalared, is also shown. The promotion of hashtags and accounts is not the only way in which the shows integrate social media into the broadcasts, there is also the inclusion of texts produced in the context of social media, as shown in Figure 5:

Figure 5 Examples of the inclusion of tweets in televised debates

This figure shows screen captions from two television shows, Mejor Hablar de Ciertas Cosas and Estadio Nacional. Both captions show in the bottom section of the screen tweets from the online discussion tagged with the hashtag of the show, including part of the online discussion on the televised show. The television shows include tweets from viewers during live transmission as a way to integrate audience opinions. Finally, other ways to include the audience via social media is participation via Twitter polls, which are part of the CNN TV show, Ciudadanos:

Figure 6 Twitter poll on broadcast TV

Figure 6 shows a screen capture from the TV show Ciudadanos. The image illustrates the inclusion of data generated online through a Twitter poll, which was posted by the broadcaster’s Twitter account. The outcome of the poll was later discussed with the panel members as a sample of the public’s views on certain topics. In this case, the social media platform provides new information relevant to the televised discussion.

Political TV shows have been widely studied as a televised genre that mixes elements of entertainment and political discussion (Giglietto & Selva, 2014). The Chilean TV shows considered in this research include some elements besides the discussion among the

panellists or guests: they also include clips that summarise the topics before the discussion starts, and extracts from other programmes showing previous interviews with politicians.

Access to the TV shows discussed in this study was through recordings of episodes, which are available on YouTube and the broadcasters’ websites. After selecting specific months for which I collected Twitter data, I watched each show carefully and produced an outline of each episode (see Appendix, section A), including information about the show itself, such as the duration, participants involved and the main topics discussed.

Table 4 Number of episodes of each TV show for the three months analysed

TV SHOW March/April July December Total

Estado Nacional 3 4 3 10

Ciudadanos 4 4 4 12

Entrevista Verdadera

12 14 16 42

El Informante 0 3 4 7

Mejor hablar de ciertas cosas

4 4 3 11

Table 4 shows the number of episodes broadcast by the selected TV shows during March (or April in the case of Ciudadanos), July and December of 2016. In total, there were 82 episodes in this time period. Collecting information about the television shows helped me to understand the Twitter data and also to situate digitally mediated discussions in the hybrid media context. The context of each television show is also important to identify the participants; many of the politicians who are not guests on the shows are mentioned by other panel members or hosts and are thus indirectly included in the debate.

The main themes discussed in the TV shows are illustrated in the following figure by frequency of topic:

Figure 7 Main topics in television shows

Figure 7, above, shows the main topics discussed during the episodes of the television shows. In the figure, bigger bubbles indicate the most frequent topics across all episodes:

different elections (presidential and municipal), along with corruption cases, were the most oft-discussed topics (for a full table of frequencies see Appendix, section B). The topics indicated here were identified from the headlines displayed on the screen and the topics introduced by the host at the beginning of the show, indicating the main topic of each episode. These topics can be considered primary discourse topics, which are introduced by the television show to frame the discussion, in contrast to secondary discourse topics that are developed later for the participants during the discussion (Krzyżanowski, 2008; Unger, 2013). Although the television show proposes certain topics for discussion, there are other topics that emerge during the interaction, not only on-screen but also online. In this case, to describe and give an overview of the television shows I only focus on primary topics.

4.3 Ethical considerations

Studying language in social media contexts requires paying careful attention to ethical implications. The Association of Internet Research (2012) has produced a set of guidelines for the ethical use of digital data in research. The document provides key advice to

researchers interested in researching online data in digital media contexts in a responsible fashion. Texts posted on social media platforms such as Twitter or Facebook are related to human processes and social practices. Through these platforms, people can interact with others, communicate and express themselves. For that reason, it is necessary to think about risks and potential harm to users involved in these social practices. In these contexts, researchers have an obligation to avoid any potential damage to people whose data are involved in their studies. The concept of “harm” cannot be defined universally because this term is highly dependent on the specific context. Researchers need to be aware of this and try to minimise the risk of the participants in the study being exposed to harm (Markham & Buchanan, 2012). Despite this, even if the research does not involve direct human contact, it is necessary to consider the possible risks involved in it.

Markham and Buchannan (2012) explain that social media data usually comprise texts or language use produced in social media contexts. For this reason, these types of data cannot be considered as isolated from the individuals that produce them. All digital information on the network involves, at some point, human subjects. Therefore, every researcher needs to balance the rights of the people involved with the social benefits of the research. Among the elements to consider are the themes of the research; there are some topics that could be sensitive in different contexts. For example, religious beliefs and political orientation can be controversial and form part of people’s personal lives, which they have the right to keep private. In certain contexts, thinking differently about politics or religion can be dangerous and even put lives at risk. For this reason, researchers always need to be sensitive to these risks, in order to minimise the impact of the research on participants' lives.

The information collected in this research could be sensitive because is related to political beliefs. In both stages of analysis, I use tweets related to political TV shows from people who have not given explicit consent for research purposes. Nevertheless, the users of this platform are taking part in a public digitally mediated debate, in which they have deliberately chosen to participate by producing texts and using the hashtags promoted on-screen. These TV shows also include tweets from the public, expanding their potential reach to national television viewers. The tweets are addressed to a wide audience and involve users expressing their political views and being part of political discussions along with other people interested in the same topics.

One of my main concerns in conducting this research was the privacy of the producers of the texts under analysis. The social media data that are used in this study are publicly available on the Twitter webpage and tweets are easily searchable on the Web. For that

reason, I could not both guarantee anonymity and quote verbatim data, which is necessary to evidence the analysis. Therefore, the authors of the texts are cited, including their usernames. The participants of this study are Twitter users who have marked their tweets with hashtags related to TV shows, and they have thus willingly decided to participate in online discussions prompted by the shows. To participate in an online discussion, it is mandatory to have a Twitter account, which requires a name and a username, but there is currently no rigorous verification of users’ identities in this process. The tweets collected for this research are addressed to a wide audience and involve users expressing their political views and being part of political discussions with other people interested in the same topics. Also, as previously stated, the TV shows include some social media texts on-screen during the shows, making the tweets available to a wider audience on national TV.

The ethical considerations around this study were submitted to the Ethics Committee of the Faculty of Arts and Social Sciences at Lancaster University and were approved (reference: FL16112).

4.4 Design of the study

Research into social media data generated by viewers of political TV shows represents a relatively new field of study, one which combines different elements of media ecology (Giglietto & Selva, 2014). Freelon and Karpf (2014) suggest that studying hybrid media might generate different kinds of content, and therefore require a hybrid method of analysis that combines different approaches. In this research, the analysis consists of two main parts: the first is an overview analysis and categorisation of tweets, in which I explore the main semiotic features and topics. The second stage involves a closer discursive and semiotic analysis of a detailed sample of the data collected. The first stage of analysis consists of the identification of topics and formal features in the entire data set to gain an overview of dual screening in the context of Chilean politics. The examination of formal features and topics helped me to understand how people use Twitter to participate in political debates in this hybrid media context and which semiotic resources are involved in this social practice. For the second stage of this research, I analysed in detail a random sample of the social media data coded in the first stage. The tweets were analysed using the discourse-historical approach (DHA) (Reisigl & Wodak, 2016) framework to explore the interaction and argumentative strategies found in this kind of data. The following table shows the type of analysis that will be conducted for each data set:

Table 5 Data sets and procedures

Data type and selection Analysis Amount of data Twitter data

This study includes a first analysis of tweets which involves processing large amounts of data. A corpus linguistics approach could have been useful, given this is a large amount of data (see e.g. Baker, 2006; Hardaker, 2016). Also, it could have helped to triangulate the findings of the discourse-historical analysis and extended the representativeness of the study (Baker et al., 2008). The use of corpus linguistics to explore social media data has contributed to the study of different social practices online, especially Twitter (see e.g. Potts et al., 2014; Hardaker & McGlashan, 2016; Page, 2012; Zappavigna, 2012). However, this type of analysis can also miss some features that are useful to discursive analyses,

In document Semiotic resources and argumentative strategies in tweets about political TV shows in Chile (Page 81-128)