• No results found

We represent the input to the algorithm—the answers of workers to pairwise relevance questions—using an edge-weighted directed graph G= (V, E, NW, NA), where V is the set of vertices that represent PoIs, E is a set of directed edges, each given by a pair of vertices, and NW and NA are edge weights, specifi-cally, functions from V×V to the natural numbers. Here, NA(pi, pj) is the number of assignments regarding the pairwise relevance question about PoIs pi and pj, and NW(pi, pj) is the number of workers who prefer pi over pj. Ratio NW(pi, pj)/NA(pi, pj)then defines the probability of edge(pi, pj).

To understand how the edges of the graph are created and updated, con-sider a pair of PoIs pi and pj. If some assignments regarding this pair have been processed, the graph will contain two edges: (pi, pj) and(pj, pi). Fur-ther, NA(pi, pj)and NA(pj, pi)will have the same value, the total number of assignments for this pair. Note that NW(pi, pj) +NW(pj, pi) ≤ NA(pi, pj). The sum is not always equal to the total number of assignments, since it is possible for a worker to answer that pi and pj are incomparable. This is recorded by incrementing the NA values on both of the edges without chang-ing the NW values.

To record the output of the algorithm—the pairwise relevance relation—

we maintain a two-dimensional matrix called the Pairwise Relevance Matrix

Paper A.

(PRM). PRM encodes all the pairwise relevances discovered by the algorithm from the answers of the crowd, including the pairwise relevances obtained directly from the answers as well as the pairwise relevances inferred using the transitivity property of the pairwise relevance relation. It is an n×n matrix, where n is the number of PoIs in the query answer. Each row and column of the matrix represents a PoI. The value of a cell records the pairwise relevance between two PoIs. Let us assume that we have a PRM M that is defined on the set of PoIs D = {p1, .., pn}. A cell M[i, j] can have one of the five possible values:

• M[i, j] =1 encodes that pjis more relevant than pi.

• M[i, j] =0 encodes that pi and pj are incomparable. Since an object is not comparable with itself, the diagonal cells have 0’s.

• M[i, j] = −1 encodes that piis more relevant than pj. Since the pairwise relevance relation is asymmetric, if M[i, j] = −1 then M[j, i] =1.

• M[i, j] =2 encodes that the pair(pi, pj)has not yet been processed. In the beginning of the algorithm, all of the cells except the diagonal cells have this value. If M[i, j] =2 then M[j, i] =2.

• M[i, j] =3 encodes that the pair(pi, pj)has been processed but that the algorithm cannot conclusively decide about the relation between piand pj. If M[i, j] =3 then M[j, i] =3.

A PRM M has the following properties:

• Transitivity: This property is used to infer pairwise relevances. If M[i, k] = 1 and M[k, j] =1 then M[i, j] =1. This is not the case if, when the in-ference is made, M[i, j] is already−1 or 0 due to previous answers of the crowd (as described next).

• Possibility of Inconsistencies: M can contain inconsistencies as workers may give contradicting answers. We say that there is an inconsistency regarding a pair of PoIs(pi, pj)if it is possible to infer M[i, j] =1 using the transitivity property from the cells M[i, k] =1 and M[k, j] =1, but M[i, j] 6=1.

The general flow of the algorithm is shown in Figure A.1. After initializa-tion of the PRM, in each iterainitializa-tion, the algorithm goes through two different phases: determining the next question and processing the question. If the relevance of every pair is computed (there are no 2s in the pairwise relevance matrix), it returns the PRM.

The flow chart of determining the next question is presented in Figure A.2.

The algorithm takes the PRM as input and determines the possible pairwise

3. Proposed Method

Parameters:

{ina, minni, pt, p-value,

maxni}

pois

Start

Initialize Pairwise Relevance

Matrix

Call Determine Next Question

Is there a valid pairwise question?

Yes Call Process the Question

No Return

Pairwise Relevance

Matrix

Fig. A.1:General Flow of PointRank

prm Start

Initialize Set of Possible Pairs

From prm

Is set of pairs empty?

Compute the Pairwise Relevance Question with

Maximum Gain

End Yes No

Fig. A.2:Flow of the Determine-Next-Question Phase

relevance questions to be asked next. If the set of questions is empty, it stops.

Otherwise, it computes the gain for each of the questions and returns the question with the maximum gain.

The flow chart of processing the question phase is shown in Figure A.3.

Paper A.

Fig. A.3:Flow of the Process-Question Phase

It takes the graph, the PRM, and the next question as input. It uses several parameters that are explained shortly. In this phase, the algorithm employs an iterative approach. In each iteration, it first determines the number of assignments for the question. Then it assigns the question to workers, gets the answers back, and updates the graph. Finally, it checks whether con-sensus on the question is reached and updates the pairwise relevance matrix accordingly.

PointRank uses six parameters: pois, ina, minni,pvalue, maxni, pt. Parame-ter pois is the list of PoIs to be ranked. ParameParame-ter ina is the initial number of assignments for each pairwise relevance question. As shown in Figure A.3, this parameter is used to determine the number of assignments for the ques-tion in the current iteraques-tion. If the iteraques-tion count is 1 or 2, the number of assignments is set to ina. Otherwise, it is computed with respect to ina and the iteration count. The algorithm makes at least two iterations in order to be able to apply significance testing to the answers of the workers. Parameter minni is the minimum number of iterations to stop creating new assignments for a pairwise relevance question. As shown in Figure A.3, the algorithm completes minni iterations before checking for consensus regarding a ques-tion. Parameter pvalue is the maximum p-value in the chi-square test needed to consider the changes in the answers to assignments in consecutive itera-tions as significant. As shown in Figure A.3, in each iteration, the algorithm applies the significance test to the accumulated answers for the question and the answers from this iteration. If the p-value of the test does not exceed pvalue, it continues to the next iteration. Parameter maxni is the maximum number of iterations for a pairwise relevance question. As shown in Figure A.3, if the question does not have a consensus after maxni iterations, the

al-3. Proposed Method

gorithm cannot make a decision regarding this pair. In other words, if there is no consensus after maxni iterations, the algorithm stops and sets the value of the corresponding cells to 3 in the PRM. Parameter pt is the probability threshold needed to determine the answer for the pairwise relevance ques-tion. In other words, in order to conclude that PoI piis preferred over PoI pj, the probability of the(pi, pj)edge should exceed pt as shown in Figure A.3.

Algorithm A.1PointRank Algorithm Input: pois, ina, pt, minni, maxni, pvalue Output: prm

1: n←pois.length

2: Initialize prm and graph

3: nq←DetermineNextQuestion(prm, n)

4: while nq6=null do

5: graph, prm ← ProcessTheQuestion(graph, prm, nq, ina, pt, minni, maxni, pvalue)

6: nq←DetermineNextQuestion(prm, n)

7: end while

8: return prm

The complete algorithm is presented in Algorithm A.1. To build the graph and to incorporate the answers of the workers into the model, we employ an iterative approach. First, the algorithm initializes the pairwise relevance matrix and the graph of answers as shown in line 2. In the initial step, the algorithm checks whether there is a valid next question as shown in lines 3–

4. If so, the algorithm processes the pairwise relevance question and gets the next question as shown in lines 5–6. If the algorithm does not need any further questions to complete the procedure, it returns the constructed pairwise relevance matrix as shown in line 8.