3.2 User Study Design
3.2.2 Three User Studies
In this study, we have three research questions:
• RQ1: How do readers judge the credibility of tweets?
• RQ2: Does the credibility of tweets correlate with factors related to external attributes other than tweet surface features?
• RQ3: Do Twitter readers’ personal characteristics play an important role in influencing credibility evaluation of information on Twitter?
Table 3.1: Data collection methods used in credibility studies
Authors Questionnaire Interview Think-aloud Crowdsourcing
Fogg et al. [2001] 3
Yang [2007] 3
Rieh and Hilligoss
[2008]
3
Castillo et al. [2011] 3 3
Gupta and Ku-
maraguru [2012] 3 ODonovan et al. [2012] 3 3 Morris et al. [2012] 3 3 3 Kang et al. [2012] 3 3 Yang et al. [2013] 3 Sikdar et al. [2013b] 3
Sikdar et al. [2013a] 3 3
Kang et al. [2015] 3 3
Lin et al. [2016] 3
For each of these research questions, we have designed three user studies to cater for the objective of each question. Although there were three studies conducted, all of them have the same methodology. Readers are shown a set of tweet messages for them to annotate the credibility level, describe the credibility features that influence their credibility perception, and answer some questionnaires (e.g. demographic data, personality test).
In the first research question, it is important to identify how readers determine the cred- ibility level of tweet messages. Based on the study by Castillo et al. [2011], Gupta and Kumaraguru [2012] and Morris et al. [2012], we have designed the user study in two parts: credibility annotation and readers’ thoughts regarding the judgement. The credibility an- notation part consists of the tweet and information regarding the tweet, such as the topic, topic description, the date the tweet was posted and the author’s Twitter ID name. The topic and topic description is given to the readers in order to mimic the search activity, as a reader should have a general idea of what event they are searching for. The author’s Twitter ID is given in order to give some identification of who wrote the tweet, as it was indicated by Morris et al. [2012] that readers are concerned with knowing the identity of the author. Meanwhile, the date posted shows the currency of the tweet and whether the tweet is posted within the time frame of the event. This is important as the tweet messages used in this study are related to news. News-related tweets are chosen based on the findings by Morris et al. [2012] that people are more concerned with the credibility of news-related tweets than any other topic.
As for collecting the readers’ thoughts on how they determine the credibility level of a tweet, we conducted a pilot test in both an interview and a questionnaire setting. The
participants were divided into two groups. The participants in the interview group were asked what influenced them to make the credibility judgement of each tweet, while in the questionnaire group the same question was asked in writing and the participants needed to write down their answers. Both groups were shown the same tweet messages. We compared the answers and both user study settings showed the participants giving similar direct and short answers. Therefore, for the bigger sampling option, we chose to use the questionnaire setting. We did not give the participants a list of possible options because we wanted the answers to be raw and genuine. Details and analysis of the questionnaires will be discussed in Chapter 4.
For the second research question, the user study was divided into three parts, rather than two parts as in the first user study, as we also added a demographic questionnaire. However, there were changes made to the credibility annotation and readers’ credibility features. In the credibility annotation section, instead of the text message, a screenshot of the tweet was shown, so that other features such as the number of likes, number of retweets and a picture of the author could be seen. This change occurred based on the findings in the first user study where some comments given by the readers were related to the said features. Also, due to this change, we made a list of those features from the findings and added more, based on the study by Castillo et al. [2011] to the readers’ credibility features section.
We also encouraged the readers to leave comments if their credibility features were not part of the list. To weight the importance of each credibility feature, a four-rating scale was chosen from strongly agree to strongly disagree. The neutral scale was removed as it does not bring meaning to this user study. If a reader has identified a credibility feature as influencing
them to make a credibility judgement, they must be certain of the feature. Further details and analysis will be discussed in Chapter 5.
In the last research question, we wanted to identify readers’ personal characteristics, including personality. Therefore, we added another section, personality test, to the user study. To ensure our user study was not exhausting and overwhelming for the readers, the short version of the personality test was chosen. Other than that, the credibility annotation section was changed to a seven-rating scale rather than the credibility level in the first two user studies. The reason for the change is based on the study by Westerman et al. [2012] and Sikdar et al. [2013a]. Having a rating scale of seven also gives wider options and better accuracy in readers’ choices [Allen and Seaman, 2007].
We have also changed the way the tweets are shown to the readers. In this user study, we wanted to eliminate readers preconception of knowing the news beforehand. Therefore, thirty simulated news tweets regarding politics, breaking news, and natural disaster news, were shown to the readers. The simulated tweets resembled tweets returned by the Twitter search engine as results for a search with query keywords. Another justification of using simulated tweets was to control the features on the tweets. Details and analysis of the questionnaires will be discussed in Chapter 6.