4.3 Data Collection
4.3.3 Reconcile – A Case Study
Reconcile: Robust Online Credibility Evaluation of Web Content is a joint research project of two universities, Polish-Japanese Institute of Informa- tion Technology (PJIIT)32 and ´Ecole Polytechnique F´ed´erale de Lausanne
(EPFL)33, supported by the grant from Switzerland through the Swiss Con-
tribution to the enlarged European Union. The Reconcile study is about how people evaluate the credibility of present Websites. The study gathers infor- mation not only about particular Websites results, but also about criteria of those evaluations together with knowledge about respondents’ backgrounds. All this combined is designed to study how people judge different sites as credible or not. As a result of the project a dataset, Web Credibility Corpus (WCC), was gathered. This section describes the dataset and the process of collection. Subsequent parts of this chapter draw practical examples from the WCC dataset to present various phenomena likely to be encountered while researching credibility.
Goals were to build a midsize corpus of Webpages in English with credibil- ity evaluations, in order to enable checking how accurately credibility can be predicted on the basis of extracted site features and to examine how context (i.e., topic, other pages within the given site, other sites on the same topic, task) influences a user’s credibility evaluation. Other goals were to (1) enable the aggregation of credibility ratings for a Website based on evaluations of Webpages within it; derive credibility ratings for texts based on evaluations of statements by extracting the influence of text and other features related
25http://www.mcafee.com/us/mcafeesecure/about/legal/trustmark-info.html 26http://www.truste.com/products-and-services/enterprise-privacy/TRUSTed-websites 27 http://austin.bbb.org/article/put-the-bbb-dynamic-seal-on-your-website-24428 28https://www.thawte.com/ssl/secured-seal/ 29https://www.trustwave.com/trustedCommerce.php 30http://www.geotrust.com/ssl/ssl-site-seals/ 31 http://www.comodo.com/e-commerce/site-seals/secure-site.php 32www.pjwstk.edu.pl 33www.epfl.ch
to credibility, and identify heuristics for people to verify a Webpage credibil- ity; (3) evaluate collaborative filtering-based approaches to recommend users credible Web content; (4) construct extensional definition of credibility, dis- entangle credibility and importance.
To recruit participants, the study used crowdsourcing marketplace Ama- zon Mechanical Turk34. To ensure best quality of the evaluations, several automatic validation mechanisms were implemented, e.g., minimum length of textual rating justification, minimum time of evaluation, provided links correctness. Additional manual evaluation was also performed and the final manual rejection rate amounted to 2%, that is, only 2% of tasks that passed automatic validation were eventually labeled as spam by hand and rejected. This, together with the relatively long time spent by the users on single eval- uations (depicted in Figure 4.1) is taken as a sign of good data quality.
FIGURE 4.1: Distribution of respondent time spent on evaluation in min- utes.
First, participants were asked about some basic information about them- selves (demographics and psychology) and their Internet skills and experience (see Figure 4.1). After completing this questionnaire they proceed to the main part of experiment — site evaluation. They could evaluate up to 50 Websites. In the meantime, some of participants were asked to do an additional task — to answer a question related to the Website topic.
Websites were selected semimanually from three sources:
1. Google queries — pages selected from Google search results for several thematic queries
2. Web of trust categories — pages selected from WOT thematic categories
34
3. Really Simple Syndication (RSS) feeds — pages selected from subscribed RSS feeds and followed for several weeks
FIGURE 4.2: Web Credibility Corpus labelers by Internet experience level.
Participating in the study were 2405 different Amazon Mechanical Turk workers. The vast majority, i.e., 95%, came from United States, which was one of the conditions of the study. More than 60% of the workers were classified as heavy Internet users. The assignment to Internet experience groups was based on a questionnaire covering user Internet activity patterns and familiarity with Internet related terms, as is described in Kakol et al. [118]. The respondents were evaluating the credibility of subsets of a total of 5691 Webpages selected for the experiment with known and balanced distribution of credibility rating compared to the external system WOT. A total of 19,872 evaluations were collected. The average was 8 evaluations per worker and 8 evaluations per Webpage. Workers were evaluating the credibility on a 5-level scale:
1. Completely not credible 2. Mostly not credible
3. Somewhat credible, although with major doubt 4. Credible, with some doubt
5. Completely credible
Respondents’ evaluation times varies from 57 seconds up to 1 hour and 25 minutes. More than 50% of evaluations were made in less than 6 minutes. Only 3 of 15,861 evaluations took less than 1 minute.
FIGURE 4.3: Credibility ratings distribution in Reconcile Web Credibility Corpus.
More than 40% of the respondents evaluated a presented site as 5 — completely credible and about 30% as 4 — credible with some doubt. Less than 10% of the evaluations were completely not credible (1) or mostly not credible (2). 15% of marks were 3 — somewhat credible, although with major doubt (see Figure 4.3).
The collected results are biased with distribution is skewed toward high credibility. More about biased credibility ratings can be found in Section 4.4.3. All Websites were categorized according to their topic. Each topic had its own question to check participants’ knowledge. Topics were grouped into 5 main categories: medicine, personal finance, healthy lifestyle, politics (with economy and ecology) and entertainment. Participants received random sites with information about their categories (see Figure 4.4).
Each participant evaluated the presented Website (selected at random from the database) providing three keywords to describe the Website, general cred- ibility evaluation (using a 5-point scale), textual justification of their rating (open-ended question, participant must describe reasons for his evaluation) and links to related Websites that can help to evaluate visited Website. Addi- tionally, evaluation of Website was done on four specific dimensions: Website appearance, author’s expertise, author’s intentions, information completeness (using a 5-point scale, see Figure 4.5). Respondents were also asked about their experience, strength of opinion and knowledge about the particular Website’s topic.
As mentioned previously, along with general credibility, respondents eval- uated other dimensions of credibility: presentation, knowledge, intentions and completeness. Results show that most of them are highly correlated (Spear-
FIGURE 4.4: Credibility ratings distribution in different thematic categories in Reconcile Web Credibility Corpus.
TABLE 4.2: Spearman Correlation of Evaluated Dimensions in Reconcile Credibility Corpus
Credibility Presentation Knowledge Intentions Completeness Credibility 1 0.56 0.61 0.54 0.64
Presentation 0.56 1 0.53 0.37 0.53 Knowledge 0.61 0.53 1 0.49 0.63 Intentions 0.54 0.37 0.49 1 0.52 Completeness 0.64 0.53 0.63 0.52 1
man’s rho above 0.5, see Table 4.2). Completeness is the most correlated dimension with credibility (0.64).
In the next section we present the analysis outcomes of the introduced Reconcile dataset.