Reconcile – A Case Study - Data Collection

4.3 Data Collection

4.3.3 Reconcile – A Case Study

Reconcile: Robust Online Credibility Evaluation of Web Content is a joint research project of two universities, Polish-Japanese Institute of Informa- tion Technology (PJIIT)32 _{and ´}_{Ecole Polytechnique F´ed´erale de Lausanne}

(EPFL)33_{, supported by the grant from Switzerland through the Swiss Con-}

tribution to the enlarged European Union. The Reconcile study is about how people evaluate the credibility of present Websites. The study gathers information not only about particular Websites results, but also about criteria of those evaluations together with knowledge about respondents’ backgrounds. All this combined is designed to study how people judge diﬀerent sites as credible or not. As a result of the project a dataset, Web Credibility Corpus (WCC), was gathered. This section describes the dataset and the process of collection. Subsequent parts of this chapter draw practical examples from the WCC dataset to present various phenomena likely to be encountered while researching credibility.

Goals were to build a midsize corpus of Webpages in English with credibility evaluations, in order to enable checking how accurately credibility can be predicted on the basis of extracted site features and to examine how context (i.e., topic, other pages within the given site, other sites on the same topic, task) inﬂuences a user’s credibility evaluation. Other goals were to (1) enable the aggregation of credibility ratings for a Website based on evaluations of Webpages within it; derive credibility ratings for texts based on evaluations of statements by extracting the inﬂuence of text and other features related

25_{http://www.mcafee.com/us/mcafeesecure/about/legal/trustmark-info.html} 26_{http://www.truste.com/products-and-services/enterprise-privacy/TRUSTed-websites} 27 http://austin.bbb.org/article/put-the-bbb-dynamic-seal-on-your-website-24428 28_{https://www.thawte.com/ssl/secured-seal/} 29_{https://www.trustwave.com/trustedCommerce.php} 30_{http://www.geotrust.com/ssl/ssl-site-seals/} 31 http://www.comodo.com/e-commerce/site-seals/secure-site.php 32_{www.pjwstk.edu.pl} 33_www.epfl.ch

to credibility, and identify heuristics for people to verify a Webpage credibility; (3) evaluate collaborative ﬁltering-based approaches to recommend users credible Web content; (4) construct extensional deﬁnition of credibility, dis- entangle credibility and importance.

To recruit participants, the study used crowdsourcing marketplace Ama- zon Mechanical Turk34. To ensure best quality of the evaluations, several automatic validation mechanisms were implemented, e.g., minimum length of textual rating justiﬁcation, minimum time of evaluation, provided links correctness. Additional manual evaluation was also performed and the ﬁnal manual rejection rate amounted to 2%, that is, only 2% of tasks that passed automatic validation were eventually labeled as spam by hand and rejected. This, together with the relatively long time spent by the users on single evaluations (depicted in Figure 4.1) is taken as a sign of good data quality.

FIGURE 4.1: Distribution of respondent time spent on evaluation in minutes.

First, participants were asked about some basic information about them- selves (demographics and psychology) and their Internet skills and experience (see Figure 4.1). After completing this questionnaire they proceed to the main part of experiment — site evaluation. They could evaluate up to 50 Websites. In the meantime, some of participants were asked to do an additional task — to answer a question related to the Website topic.

Websites were selected semimanually from three sources:

1. Google queries — pages selected from Google search results for several thematic queries

2. Web of trust categories — pages selected from WOT thematic categories

3. Really Simple Syndication (RSS) feeds — pages selected from subscribed RSS feeds and followed for several weeks

FIGURE 4.2: Web Credibility Corpus labelers by Internet experience level.

Participating in the study were 2405 diﬀerent Amazon Mechanical Turk workers. The vast majority, i.e., 95%, came from United States, which was one of the conditions of the study. More than 60% of the workers were classiﬁed as heavy Internet users. The assignment to Internet experience groups was based on a questionnaire covering user Internet activity patterns and familiarity with Internet related terms, as is described in Kakol et al. [118]. The respondents were evaluating the credibility of subsets of a total of 5691 Webpages selected for the experiment with known and balanced distribution of credibility rating compared to the external system WOT. A total of 19,872 evaluations were collected. The average was 8 evaluations per worker and 8 evaluations per Webpage. Workers were evaluating the credibility on a 5-level scale:

1. Completely not credible 2. Mostly not credible

3. Somewhat credible, although with major doubt 4. Credible, with some doubt

5. Completely credible

Respondents’ evaluation times varies from 57 seconds up to 1 hour and 25 minutes. More than 50% of evaluations were made in less than 6 minutes. Only 3 of 15,861 evaluations took less than 1 minute.

FIGURE 4.3: Credibility ratings distribution in Reconcile Web Credibility Corpus.

More than 40% of the respondents evaluated a presented site as 5 — completely credible and about 30% as 4 — credible with some doubt. Less than 10% of the evaluations were completely not credible (1) or mostly not credible (2). 15% of marks were 3 — somewhat credible, although with major doubt (see Figure 4.3).

The collected results are biased with distribution is skewed toward high credibility. More about biased credibility ratings can be found in Section 4.4.3. All Websites were categorized according to their topic. Each topic had its own question to check participants’ knowledge. Topics were grouped into 5 main categories: medicine, personal ﬁnance, healthy lifestyle, politics (with economy and ecology) and entertainment. Participants received random sites with information about their categories (see Figure 4.4).

Each participant evaluated the presented Website (selected at random from the database) providing three keywords to describe the Website, general credibility evaluation (using a 5-point scale), textual justiﬁcation of their rating (open-ended question, participant must describe reasons for his evaluation) and links to related Websites that can help to evaluate visited Website. Addi- tionally, evaluation of Website was done on four speciﬁc dimensions: Website appearance, author’s expertise, author’s intentions, information completeness (using a 5-point scale, see Figure 4.5). Respondents were also asked about their experience, strength of opinion and knowledge about the particular Website’s topic.

As mentioned previously, along with general credibility, respondents evaluated other dimensions of credibility: presentation, knowledge, intentions and completeness. Results show that most of them are highly correlated (Spear-

FIGURE 4.4: Credibility ratings distribution in diﬀerent thematic categories in Reconcile Web Credibility Corpus.

TABLE 4.2: Spearman Correlation of Evaluated Dimensions in Reconcile Credibility Corpus

Credibility Presentation Knowledge Intentions Completeness Credibility 1 0.56 0.61 0.54 0.64

Presentation 0.56 1 0.53 0.37 0.53 Knowledge 0.61 0.53 1 0.49 0.63 Intentions 0.54 0.37 0.49 1 0.52 Completeness 0.64 0.53 0.63 0.52 1

man’s rho above 0.5, see Table 4.2). Completeness is the most correlated dimension with credibility (0.64).

In the next section we present the analysis outcomes of the introduced Reconcile dataset.

In document (Chapman & Hall_CRC Machine Learning & Pattern Recognition) -Computational Trust Models and Machine Learning-Chapman & Hall Crc (2014) (Page 111-115)